Timezone: »
Poster
Neural Networks Learning and Memorization with (almost) no Over-Parameterization
Amit Daniely
Many results in recent years established polynomial time learnability of various models via neural networks algorithms
(e.g. \cite{andoni2014learning, daniely2016toward, daniely2017sgd, cao2019generalization, ziwei2019polylogarithmic, zou2019improved, ma2019comparative, du2018gradient, arora2019fine, song2019quadratic, oymak2019towards, ge2019mildly, brutzkus2018sgd}).
However, unless the model is linear separable~\cite{brutzkus2018sgd}, or the activation is a polynomial~\cite{ge2019mildly}, these results require very large networks -- much more than what is needed for the mere existence of a good predictor.
In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with {\em near optimal} network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\sphere^{d-1}$.
Author Information
Amit Daniely (Hebrew University and Google Research)
More from the Same Authors
-
2020 Poster: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations »
Amit Daniely · Hadas Shacham -
2020 Spotlight: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations »
Amit Daniely · Hadas Shacham -
2020 Poster: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2020 Poster: Hardness of Learning Neural Networks with Natural Weights »
Amit Daniely · Gal Vardi -
2020 Oral: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2019 Poster: Locally Private Learning without Interaction Requires Separation »
Amit Daniely · Vitaly Feldman -
2019 Poster: Generalization Bounds for Neural Networks via Approximate Description Length »
Amit Daniely · Elad Granot -
2019 Spotlight: Generalization Bounds for Neural Networks via Approximate Description Length »
Amit Daniely · Elad Granot -
2017 Poster: SGD Learns the Conjugate Kernel Class of the Network »
Amit Daniely -
2016 Poster: Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity »
Amit Daniely · Roy Frostig · Yoram Singer -
2013 Poster: More data speeds up training time in learning halfspaces over sparse vectors »
Amit Daniely · Nati Linial · Shai Shalev-Shwartz -
2013 Spotlight: More data speeds up training time in learning halfspaces over sparse vectors »
Amit Daniely · Nati Linial · Shai Shalev-Shwartz -
2012 Poster: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz -
2012 Spotlight: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz