Timezone: »

Faking Interpolation Until You Make It
Alasdair Paren · Rudra Poudel · Pawan K Mudigonda

Deep over-parameterized neural networks exhibit the interpolation property on many data sets. That is, these models are able to achieve approximately zero loss on all training samples simultaneously. Recently, this property has been exploited to develop novel optimisation algorithms for this setting. These algorithms use the fact that the optimal loss value is known to employ a variation of a Polyak Step-size calculated on a stochastic batch of data. In this work, we introduce an algorithm that extends this idea to tasks where the interpolation property does not hold. As we no longer have access to the optimal loss values a priori, we instead estimate them for each sample online. To realise this, we introduce a simple but highly effective heuristic for approximating the optimal value based on previous loss evaluations. Through rigorous experimentation we show the effectiveness of our approach, which outperforms adaptive gradient and line search methods.

Author Information

Alasdair Paren (University of Oxford)
Rudra Poudel (Toshiba Research)
Pawan K Mudigonda (University of Oxford)

More from the Same Authors