Timezone: »
Poster
What Can ResNet Learn Efficiently, Going Beyond Kernels?
Zeyuan Allen-Zhu · Yuanzhi Li
Wed Dec 11 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #231
How can neural networks such as ResNet \emph{efficiently} learn CIFAR-10 with test accuracy more than $96 \%$, while other methods, especially kernel methods, fall relatively behind? Can we more provide theoretical justifications for this gap?
Recently, there is an influential line of work relating neural networks to kernels in the over-parameterized regime, proving they can learn certain concept class that is also learnable by kernels with similar test error. Yet, can neural networks provably learn some concept class \emph{better} than kernels?
We answer this positively in the distribution-free setting. We prove neural networks can efficiently learn a notable class of functions, including those defined by three-layer residual networks with smooth activations, without any distributional assumption.
At the same time, we prove there are simple functions in this class such that with the same number of training examples, the test error obtained by neural networks can be \emph{much smaller} than \emph{any} kernel method, including neural tangent kernels (NTK).
The main intuition is that \emph{multi-layer} neural networks can implicitly perform hierarchal learning using different layers, which reduces the sample complexity comparing to ``one-shot'' learning algorithms such as kernel methods.
In the end, we also prove a computation complexity advantage of ResNet with respect to other learning methods including linear regression over arbitrary feature mappings.
Author Information
Zeyuan Allen-Zhu (Microsoft Research)
Yuanzhi Li (Princeton)
More from the Same Authors
-
2019 Poster: On the Convergence Rate of Training Recurrent Neural Networks »
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song -
2019 Poster: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers »
Zeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang -
2019 Poster: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Spotlight: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Poster: Can SGD Learn Recurrent Neural Networks with Provable Generalization? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2019 Spotlight: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2018 Poster: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD »
Zeyuan Allen-Zhu -
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li -
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li -
2018 Poster: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: NEON2: Finding Local Minima via First-Order Oracles »
Zeyuan Allen-Zhu · Yuanzhi Li -
2018 Spotlight: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: Is Q-Learning Provably Efficient? »
Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan -
2018 Poster: The Lingering of Gradients: How to Reuse Gradients Over Time »
Zeyuan Allen-Zhu · David Simchi-Levi · Xinshang Wang -
2018 Poster: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2018 Spotlight: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2017 Poster: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan -
2016 Poster: Optimal Black-Box Reductions Between Optimization Objectives »
Zeyuan Allen-Zhu · Elad Hazan -
2016 Poster: Even Faster SVD Decomposition Yet Without Agonizing Pain »
Zeyuan Allen-Zhu · Yuanzhi Li