Timezone: »
We study the supervised learning problem under either of the following two models: (1) Feature vectors xi are d-dimensional Gaussian and responses are yi = f*(xi) for f* an unknown quadratic function; (2) Feature vectors xi are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk.
Author Information
Behrooz Ghorbani (Stanford University)
Song Mei (Stanford University)
Theodor Misiakiewicz (Stanford University)
Andrea Montanari (Stanford)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Spotlight: Limitations of Lazy Training of Two-layers Neural Network »
Thu. Dec 12th 12:10 -- 12:15 AM Room West Exhibition Hall C + B3
More from the Same Authors
-
2022 Poster: Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression »
Lechao Xiao · Hong Hu · Theodor Misiakiewicz · Yue Lu · Jeffrey Pennington -
2022 Poster: Learning with convolution and pooling operations in kernel methods »
Theodor Misiakiewicz · Song Mei -
2021 Poster: Streaming Belief Propagation for Community Detection »
Yuchen Wu · Jakab Tardos · Mohammadhossein Bateni · André Linhares · Filipe Miguel Goncalves de Almeida · Andrea Montanari · Ashkan Norouzi-Fard -
2020 Poster: When Do Neural Networks Outperform Kernel Methods? »
Behrooz Ghorbani · Song Mei · Theodor Misiakiewicz · Andrea Montanari -
2018 Poster: Contextual Stochastic Block Models »
Yash Deshpande · Subhabrata Sen · Andrea Montanari · Elchanan Mossel -
2018 Spotlight: Contextual Stochastic Block Models »
Yash Deshpande · Subhabrata Sen · Andrea Montanari · Elchanan Mossel -
2017 Poster: Inference in Graphical Models via Semidefinite Programming Hierarchies »
Murat Erdogdu · Yash Deshpande · Andrea Montanari -
2015 : Information-theoretic bounds on learning network dynamics »
Andrea Montanari -
2015 Poster: Convergence rates of sub-sampled Newton methods »
Murat Erdogdu · Andrea Montanari -
2015 Poster: On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank-One Perturbations of Gaussian Tensors »
Andrea Montanari · Daniel Reichman · Ofer Zeitouni -
2014 Poster: A statistical model for tensor PCA »
Emile Richard · Andrea Montanari -
2014 Poster: Cone-Constrained Principal Component Analysis »
Yash Deshpande · Andrea Montanari · Emile Richard -
2014 Poster: Sparse PCA via Covariance Thresholding »
Yash Deshpande · Andrea Montanari -
2013 Poster: Estimating LASSO Risk and Noise Level »
Mohsen Bayati · Murat Erdogdu · Andrea Montanari -
2013 Poster: Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models »
Adel Javanmard · Andrea Montanari -
2013 Poster: Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition »
Adel Javanmard · Andrea Montanari -
2010 Poster: Learning Networks of Stochastic Differential Equations »
José Bento · Morteza Ibrahimi · Andrea Montanari -
2010 Poster: The LASSO risk: asymptotic results and real world examples »
Mohsen Bayati · José Bento · Andrea Montanari -
2009 Poster: Matrix Completion from Noisy Entries »
Raghunandan Keshavan · Andrea Montanari · Sewoong Oh -
2009 Poster: Which graphical models are difficult to learn? »
Andrea Montanari · José Bento