Timezone: »
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that guesses low-entropy labels for data-augmented unlabeled examples and mixes labeled and unlabeled data using MixUp. MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success. Code is attached.
Author Information
David Berthelot (Google Brain)
Nicholas Carlini (Google)
Ian Goodfellow (Google Brain)
Nicolas Papernot (University of Toronto)
Avital Oliver (Google Brain)
Colin A Raffel (Google Brain)
My research focuses on machine learning techniques for sequential data. I am currently a resident at Google Brain. I recently completed a PhD in Electrical Engineering at Columbia University In LabROSA, supervised by Dan Ellis. My thesis focused on learning-based methods for comparing sequences. In 2010, I received a Master's in Music, Science and Technology from Stanford University's CCRMA, supervised by Julius O. Smith III. I did my undergrad at Oberlin College, where I majored in Mathematics.
More from the Same Authors
-
2021 : Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2022 : Part-Based Models Improve Adversarial Robustness »
Chawin Sitawarin · Kornrapat Pongmala · Yizheng Chen · Nicholas Carlini · David Wagner -
2020 : Contributed Talk #2: On the (Im)Possibility of Private Machine Learning through Instance Encoding »
Nicholas Carlini -
2020 Poster: Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples »
Samarth Sinha · Zhengli Zhao · Anirudh Goyal · Colin A Raffel · Augustus Odena -
2020 Poster: FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence »
Kihyuk Sohn · David Berthelot · Nicholas Carlini · Zizhao Zhang · Han Zhang · Colin A Raffel · Ekin Dogus Cubuk · Alexey Kurakin · Chun-Liang Li -
2020 Poster: Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming »
Sumanth Dathathri · Krishnamurthy Dvijotham · Alexey Kurakin · Aditi Raghunathan · Jonathan Uesato · Rudy Bunel · Shreya Shankar · Jacob Steinhardt · Ian Goodfellow · Percy Liang · Pushmeet Kohli -
2018 : Poster Session 1 »
Evan Casey · Colin A Raffel · Jonathan Simon · Juncheng Li · Robert Saunders · Petra Gemeinboeck · Eunsu Kang · Songwei Ge · Curtis Hawthorne · Anna Huang · Ting-Wei Su · Eric Chu · Memo Akten · Sonam Damani · Khyatti Gupta · Dilpreet Singh · Patrick Hutchings -
2018 Poster: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2018 Spotlight: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow