Timezone: »
Spotlight
Generalization Bounds for Neural Networks via Approximate Description Length
Amit Daniely · Elad Granot
We investigate the sample complexity of networks with bounds on the magnitude of its weights.
In particular, we consider the class
\[
\cn = \left\{W_t\circ\rho\circ W_{t-1}\circ\rho\ldots\circ \rho\circ W_{1} : W_1,\ldots,W_{t-1}\in M_{d\times d}, W_t\in M_{1,d} \right\}
\]
where the spectral norm of each $W_i$ is bounded by $O(1)$, the Frobenius norm is bounded by $R$, and $\rho$ is the sigmoid function $\frac{e^x}{1 + e^x}$ or the smoothened ReLU function $ \ln\left(1 + e^x\right)$.
We show that for any depth $t$, if the inputs are in $[-1,1]^d$, the sample complexity of $\cn$ is $\tilde O\left(\frac{dR^2}{\epsilon^2}\right)$. This bound is optimal up to log-factors, and substantially improves over the previous state of the art of $\tilde O\left(\frac{d^2R^2}{\epsilon^2}\right)$, that was established in a recent line of work.
We furthermore show that this bound remains valid if instead of considering the magnitude of the $W_i$'s, we consider the magnitude of $W_i - W_i^0$, where $W_i^0$ are some reference matrices, with spectral norm of $O(1)$. By taking the $W_i^0$ to be the matrices in the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many {\em typical} regimes of parameters.
To establish our results we develop a new technique to analyze the sample complexity of families $\ch$ of predictors.
We start by defining a new notion of a randomized approximate description of functions $f:\cx\to\reals^d$. We then show that if there is a way to approximately describe functions in a class $\ch$ using $d$ bits, then $\frac{d}{\epsilon^2}$ examples suffices to guarantee uniform convergence. Namely, that the empirical loss of all the functions in the class is $\epsilon$-close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.
Author Information
Amit Daniely (Hebrew University and Google Research)
Elad Granot (Hebrew University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Generalization Bounds for Neural Networks via Approximate Description Length »
Thu. Dec 12th 01:00 -- 03:00 AM Room East Exhibition Hall B + C #228
More from the Same Authors
-
2023 Poster: Most Neural Networks Are Almost Learnable »
Amit Daniely · Nati Srebro · Gal Vardi -
2023 Poster: Multiclass Boosting: Simple and Intuitive Weak Learning Criteria »
Nataly Brukhim · Amit Daniely · Yishay Mansour · Shay Moran -
2023 Poster: Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy »
Amit Daniely · Nati Srebro · Gal Vardi -
2020 Poster: Neural Networks Learning and Memorization with (almost) no Over-Parameterization »
Amit Daniely -
2020 Poster: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations »
Amit Daniely · Hadas Shacham -
2020 Spotlight: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations »
Amit Daniely · Hadas Shacham -
2020 Poster: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2020 Poster: Hardness of Learning Neural Networks with Natural Weights »
Amit Daniely · Gal Vardi -
2020 Oral: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2019 Poster: Locally Private Learning without Interaction Requires Separation »
Amit Daniely · Vitaly Feldman -
2017 Poster: SGD Learns the Conjugate Kernel Class of the Network »
Amit Daniely -
2016 Poster: Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity »
Amit Daniely · Roy Frostig · Yoram Singer -
2013 Poster: More data speeds up training time in learning halfspaces over sparse vectors »
Amit Daniely · Nati Linial · Shai Shalev-Shwartz -
2013 Spotlight: More data speeds up training time in learning halfspaces over sparse vectors »
Amit Daniely · Nati Linial · Shai Shalev-Shwartz -
2012 Poster: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz -
2012 Spotlight: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz