Timezone: »
Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work of Nakkiran and Bansal has empirically observed that such networks behave as “conditional samplers” from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling behavior in the context of learning theory. We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data. We show that samplers, while being bad classifiers, can be good teachers. Concretely, we prove that distillation from samplers is guaranteed to produce a student which approximates the Bayes optimal classifier. Finally, we show that some common learning algorithms (e.g., Nearest-Neighbours and Kernel Machines) can often generate samplers when applied in the overparameterized regime.
Author Information
Gal Kaplun (Harvard University)
Eran Malach (Hebrew University Jerusalem Israel)
Preetum Nakkiran (Apple)
Shai Shalev-Shwartz (Mobileye & HUJI)
More from the Same Authors
-
2021 Spotlight: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2022 : Deconstructing Distributions: A Pointwise Framework of Learning »
Gal Kaplun · Nikhil Ghosh · Saurabh Garg · Boaz Barak · Preetum Nakkiran -
2023 Poster: Resource Tradeoffs for Deep Feature Learning: Data, Compute, Width, and Luck »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Eran Malach · Cyril Zhang -
2023 Poster: When Does Optimizing a Proper Loss Yield Calibration? »
Jarosław Błasiok · Parikshit Gopalan · Lunjia Hu · Preetum Nakkiran -
2022 Poster: Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting »
Neil Mallinar · James Simon · Amirhesam Abedsoltan · Parthe Pandit · Misha Belkin · Preetum Nakkiran -
2022 Poster: Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit »
Boaz Barak · Benjamin Edelman · Surbhi Goel · Sham Kakade · Eran Malach · Cyril Zhang -
2022 Poster: What You See is What You Get: Principled Deep Learning via Distributional Generalization »
Bogdan Kulynych · Yao-Yuan Yang · Yaodong Yu · Jarosław Błasiok · Preetum Nakkiran -
2021 : Q&A with Shai Shalev-Shwartz »
Shai Shalev-Shwartz -
2021 : Deep Learning: Success, Failure, and the Border between them, Shai Shalev-Shwartz »
Shai Shalev-Shwartz -
2021 Poster: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2020 Poster: The Implications of Local Correlation on Learning Some Deep Functions »
Eran Malach · Shai Shalev-Shwartz -
2020 Poster: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2020 Oral: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2019 Poster: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Spotlight: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Poster: Is Deeper Better only when Shallow is Good? »
Eran Malach · Shai Shalev-Shwartz -
2017 Poster: Decoupling "when to update" from "how to update" »
Eran Malach · Shai Shalev-Shwartz -
2015 Poster: Beyond Convexity: Stochastic Quasi-Convex Optimization »
Elad Hazan · Kfir Y. Levy · Shai Shalev-Shwartz