Timezone: »
For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization.
How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier works.
We hypothesize that such a latent low-dimensional structure is present in image classification. We numerically test this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.
Author Information
Behrooz Ghorbani (Stanford University)
Song Mei (Stanford University)
Theodor Misiakiewicz (Stanford University)
Andrea Montanari (Stanford)
More from the Same Authors
-
2022 Poster: Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression »
Lechao Xiao · Hong Hu · Theodor Misiakiewicz · Yue Lu · Jeffrey Pennington -
2022 Poster: Learning with convolution and pooling operations in kernel methods »
Theodor Misiakiewicz · Song Mei -
2021 Poster: Streaming Belief Propagation for Community Detection »
Yuchen Wu · Jakab Tardos · Mohammadhossein Bateni · André Linhares · Filipe Miguel Goncalves de Almeida · Andrea Montanari · Ashkan Norouzi-Fard -
2019 Poster: Limitations of Lazy Training of Two-layers Neural Network »
Behrooz Ghorbani · Song Mei · Theodor Misiakiewicz · Andrea Montanari -
2019 Spotlight: Limitations of Lazy Training of Two-layers Neural Network »
Behrooz Ghorbani · Song Mei · Theodor Misiakiewicz · Andrea Montanari -
2018 Poster: Contextual Stochastic Block Models »
Yash Deshpande · Subhabrata Sen · Andrea Montanari · Elchanan Mossel -
2018 Spotlight: Contextual Stochastic Block Models »
Yash Deshpande · Subhabrata Sen · Andrea Montanari · Elchanan Mossel -
2017 Poster: Inference in Graphical Models via Semidefinite Programming Hierarchies »
Murat Erdogdu · Yash Deshpande · Andrea Montanari -
2015 : Information-theoretic bounds on learning network dynamics »
Andrea Montanari -
2015 Poster: Convergence rates of sub-sampled Newton methods »
Murat Erdogdu · Andrea Montanari -
2015 Poster: On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank-One Perturbations of Gaussian Tensors »
Andrea Montanari · Daniel Reichman · Ofer Zeitouni -
2014 Poster: A statistical model for tensor PCA »
Emile Richard · Andrea Montanari -
2014 Poster: Cone-Constrained Principal Component Analysis »
Yash Deshpande · Andrea Montanari · Emile Richard -
2014 Poster: Sparse PCA via Covariance Thresholding »
Yash Deshpande · Andrea Montanari -
2013 Poster: Estimating LASSO Risk and Noise Level »
Mohsen Bayati · Murat Erdogdu · Andrea Montanari -
2013 Poster: Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models »
Adel Javanmard · Andrea Montanari -
2013 Poster: Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition »
Adel Javanmard · Andrea Montanari -
2010 Poster: Learning Networks of Stochastic Differential Equations »
José Bento · Morteza Ibrahimi · Andrea Montanari -
2010 Poster: The LASSO risk: asymptotic results and real world examples »
Mohsen Bayati · José Bento · Andrea Montanari -
2009 Poster: Matrix Completion from Noisy Entries »
Raghunandan Keshavan · Andrea Montanari · Sewoong Oh -
2009 Poster: Which graphical models are difficult to learn? »
Andrea Montanari · José Bento