Poster
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason Lee · Kazusato Oko · Taiji Suzuki · Denny Wu
West Ballroom A-D #7201
Abstract:
We study the problem of gradient descent learning of a single-index target function under isotropic Gaussian data in , where the unknown link function has information exponent (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and runtime complexity of , where hides a constant only depending on the degree of ; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. More generally, we show that samples are sufficient to achieve low generalization error, where is the \textit{generative exponent} of the link function. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.
Chat is not available.