Timezone: »

On the Role of Optimization in Double Descent: A Least Squares Study
Ilja Kuzborskij · Csaba Szepesvari · Omar Rivasplata · Amal Rannen-Triki · Razvan Pascanu

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @ None #None

Empirically it has been observed that the performance of deep neural networks steadily improves with increased model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomenon has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the sample covariance matrix of the input features, via a functional form that has the double descent behaviour. This gives a new perspective on the double descent curves reported in the literature, as our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in the case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing works, shedding some light on a possible cause of this phenomenon, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the spectrum of the sample covariance of features at intermediary hidden layers has a similar behaviour as the one predicted by our derivations in the least squares setting.

Author Information

Ilja Kuzborskij (DeepMind)
Csaba Szepesvari (DeepMind / University of Alberta)
Omar Rivasplata (IMSS UCL)

My top-level areas of interest are statistical learning theory, machine learning, probability and statistics. These days I am very interested in deep learning and reinforcement learning. I am affiliated with the Institute for Mathematical and Statistical Sciences, University College London, hosted by the Department of Statistical Science as a Senior Research Fellow. Before my current post I was at UCL Mathematics for a few months, and previously I was at UCL Computer Science for a few years, where I did research studies (machine learning) sponsored by DeepMind and in parallel I was a research scientist intern at DeepMind for three years. Back in the day I studied undergraduate maths (BSc 2000, Pontificia Universidad Católica del Perú) and graduate maths (MSc 2005, PhD 2012, University of Alberta). I've lived in Peru, in Canada, and now I'm based in the UK.

Amal Rannen-Triki (DeepMind)
Razvan Pascanu (Google DeepMind)

More from the Same Authors