Timezone: »

 
Poster
Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression
Lechao Xiao · Jeffrey Pennington · Theodor Misiakiewicz · Hong Hu · Yue Lu

Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #940
As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves that characterize how the prediction error depends on the number of samples is restricted to either large-sample asymptotics ($m\to\infty$) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension ($m\propto d$). There is a wide gulf between these two regimes, including all higher-order scaling relations $m\propto d^r$, which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias, and variance, for data drawn uniformly from the sphere with isotropic random labels in the $r$th-order asymptotic scaling regime $m\to\infty$ with $m/d^r$ held constant. We observe a peak in the learning curve whenever $m \approx d^r/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.

Author Information

Lechao Xiao (Google Brain)

Lechao is a research scientist in the Brain team in Google Research, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at Urbana-Champaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc.

Jeffrey Pennington (Google Brain)
Theodor Misiakiewicz (Stanford University)
Hong Hu (Harvard)
Yue Lu (Harvard University)

More from the Same Authors