Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training

Batch size selection by stochastic optimal contro

Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting


Abstract: SGD and its variants are widespread in the field of machine learning. Although there is extensive research on the choice of step-size schedules to guarantee convergence of these methods, there is substantially less work examining the influence of the batch size on optimization. The latter is typically kept constant and chosen via experimental validation.\\ In this work we take a stochastic optimal control perspective to understand the effect of the batch size when optimizing non-convex functions with SGD. Specifically, we define an optimal control problem, which considers the \emph{entire} trajectory of SGD to choose the optimal batch size for a noisy quadratic model. We show that the batch size is inherently coupled with the step size and that for saddles there is a transition-time $t^*$, after which it is beneficial to increase the batch size to reduce the covariance of the stochastic gradients. We verify our results empirically on various convex and non-convex problems.

Chat is not available.