Workshop: OPT2020: Optimization for Machine Learning

Contributed Video: How to make your optimizer generalize better, Sharan Vaswani

Sharan Vaswani


We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. For over-parameterized linear regression, where there are infinitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum l2-norm solution with favourable generalization properties. For under-parameterized linear classification, optimizers can converge to different decision boundaries separating the data. We prove that for any such classifier, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. We argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.