Timezone: »
This paper shows that deep learning, i.e., neural networks trained by SGD, can learn in polytime any function class that can be learned in polytime by some algorithm, including parities. This universal result is further shown to be robust, i.e., it holds under possibly poly-noise on the gradients, which gives a separation between deep learning and statistical query algorithms, as the latter are not comparably universal due to cases like parities. This also shows that SGD-based deep learning does not suffer from the limitations of the perceptron discussed by Minsky-Papert '69. The paper further complement this result with a lower-bound on the generalization error of descent algorithms, which implies in particular that the robust universality breaks down if the gradients are averaged over large enough batches of samples as in full-GD, rather than fewer samples as in SGD.
Author Information
Emmanuel Abbe (Princeton University)
Colin Sandon (MIT)
More from the Same Authors
-
2021 Spotlight: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 Poster: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2018 Poster: Chaining Mutual Information and Tightening Generalization Bounds »
Amir Asadi · Emmanuel Abbe · Sergio Verdu -
2017 Poster: Nonbacktracking Bounds on the Influence in Independent Cascade Models »
Emmanuel Abbe · Sanjeev Kulkarni · Eun Jee Lee -
2016 Poster: Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation »
Emmanuel Abbe · Colin Sandon -
2016 Oral: Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation »
Emmanuel Abbe · Colin Sandon -
2015 Poster: Recovering Communities in the General Stochastic Block Model Without Knowing the Parameters »
Emmanuel Abbe · Colin Sandon