Poster
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Kazusato Oko · Denny Wu · Jason Lee · Taiji Suzuki
West Ballroom A-D #7201
[
Abstract
]
Fri 13 Dec 11 a.m. PST
— 2 p.m. PST
Abstract:
We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree-$q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d^{\Theta(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with arbitrary polynomial link using $\tilde{O}(d)$ time and sample complexity, regardless of information exponent; this matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.
Live content is unavailable. Log in and register to view live content