Timezone: »
The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized?
In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network.
On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network, and connect it to the SGD theory of escaping saddle points.
Author Information
Zeyuan Allen-Zhu (Microsoft Research)
Yuanzhi Li (Princeton)
Yingyu Liang (University of Wisconsin Madison)
More from the Same Authors
-
2022 : Domain Generalization with Nuclear Norm Regularization »
Zhenmei Shi · Yifei Ming · Ying Fan · Frederic Sala · Yingyu Liang -
2022 : Best of Both Worlds: Towards Adversarial Robustness with Transduction and Rejection »
Nils Palumbo · Yang Guo · Xi Wu · Jiefeng Chen · Yingyu Liang · Somesh Jha -
2020 Poster: Functional Regularization for Representation Learning: A Unified Theoretical Perspective »
Siddhant Garg · Yingyu Liang -
2019 Poster: N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules »
Shengchao Liu · Mehmet Demirel · Yingyu Liang -
2019 Spotlight: N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules »
Shengchao Liu · Mehmet Demirel · Yingyu Liang -
2019 Poster: On the Convergence Rate of Training Recurrent Neural Networks »
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song -
2019 Poster: Robust Attribution Regularization »
Jiefeng Chen · Xi Wu · Vaibhav Rastogi · Yingyu Liang · Somesh Jha -
2019 Poster: What Can ResNet Learn Efficiently, Going Beyond Kernels? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Spotlight: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Poster: Can SGD Learn Recurrent Neural Networks with Provable Generalization? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2019 Spotlight: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2018 Poster: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD »
Zeyuan Allen-Zhu -
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li -
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li -
2018 Poster: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: NEON2: Finding Local Minima via First-Order Oracles »
Zeyuan Allen-Zhu · Yuanzhi Li -
2018 Spotlight: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: Is Q-Learning Provably Efficient? »
Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan -
2018 Poster: The Lingering of Gradients: How to Reuse Gradients Over Time »
Zeyuan Allen-Zhu · David Simchi-Levi · Xinshang Wang -
2018 Poster: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2018 Spotlight: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2017 Poster: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan -
2016 Poster: Optimal Black-Box Reductions Between Optimization Objectives »
Zeyuan Allen-Zhu · Elad Hazan -
2016 Poster: Even Faster SVD Decomposition Yet Without Agonizing Pain »
Zeyuan Allen-Zhu · Yuanzhi Li