Workshop: OPT 2022: Optimization for Machine Learning

adaStar: A Method for Adapting to Interpolation

Gary Cheng · John Duchi


Stochastic convex optimization methods are much faster at minimizing interpolation problems—problems where all sample losses share a common minimizer—than non-interpolating problems. However, stochastic gradient methods need to use step sizes tailored for the interpolation setting, which are sub-optimal for non-interpolating problems, to attain these fast rates. This is problematic because verifying whether a problem is interpolating, without minimizing it, is difficult. Moreover, because interpolation is not a stable property—the addition of a single datapoint can transform an interpolating dataset into a non-interpolating one—we would like our methods to get the fast interpolation rate when it can while being robust to these perturbations. We address these problems by presenting an adaptive stochastic gradient method, termed adaStar, which attains the optimal, fast rate on smooth interpolation problems (up to log factors) and gracefully degrades with the minimal objective value for non-interpolating problems. We use this method as a building block to construct another stochastic gradient method, termed adaStar-G which adapts to interpolation and growth conditions, getting even faster rates.

Chat is not available.