Timezone: »
Batch size selection by stochastic optimal contro
Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting
Event URL: https://openreview.net/forum?id=Te-9Ig5ftj »
SGD and its variants are widespread in the field of machine learning. Although there is extensive research on the choice of step-size schedules to guarantee convergence of these methods, there is substantially less work examining the influence of the batch size on optimization. The latter is typically kept constant and chosen via experimental validation.\\ In this work we take a stochastic optimal control perspective to understand the effect of the batch size when optimizing non-convex functions with SGD. Specifically, we define an optimal control problem, which considers the \emph{entire} trajectory of SGD to choose the optimal batch size for a noisy quadratic model. We show that the batch size is inherently coupled with the step size and that for saddles there is a transition-time $t^*$, after which it is beneficial to increase the batch size to reduce the covariance of the stochastic gradients. We verify our results empirically on various convex and non-convex problems.
SGD and its variants are widespread in the field of machine learning. Although there is extensive research on the choice of step-size schedules to guarantee convergence of these methods, there is substantially less work examining the influence of the batch size on optimization. The latter is typically kept constant and chosen via experimental validation.\\ In this work we take a stochastic optimal control perspective to understand the effect of the batch size when optimizing non-convex functions with SGD. Specifically, we define an optimal control problem, which considers the \emph{entire} trajectory of SGD to choose the optimal batch size for a noisy quadratic model. We show that the batch size is inherently coupled with the step size and that for saddles there is a transition-time $t^*$, after which it is beneficial to increase the batch size to reduce the covariance of the stochastic gradients. We verify our results empirically on various convex and non-convex problems.
Author Information
Jim Zhao (ETHZ - ETH Zurich)
Aurelien Lucchi (Swiss Federal Institute of Technology)
Frank Proske (Department of Mathematics, University of Oslo)
Antonio Orvieto (ETH Zurich)
PhD Student at ETH Zurich. I’m interested in the design and analysis of optimization algorithms for deep learning. Interned at DeepMind, MILA, and Meta. All publications at http://orvi.altervista.org/ Looking for postdoc positions! :) antonio.orvieto@inf.ethz.ch
Hans Kersting (INRIA)

I am a postdoctoral researcher at the Sierra team at INRIA Paris, advised by Francis Bach. My research focuses on probabilistic methods for machine learning, especially in the context of dynamical systems and optimization.
More from the Same Authors
-
2022 : Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning »
Sanghwan Kim · Lorenzo Noci · Antonio Orvieto · Thomas Hofmann -
2023 Poster: Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers »
Sotiris Anagnostidis · Dario Pavllo · Luca Biggio · Lorenzo Noci · Aurelien Lucchi · Thomas Hofmann -
2023 Poster: A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression »
Tin Sum Cheng · Aurelien Lucchi · Anastasis Kratsios · Ivan Dokmanić · David Belius -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse »
Lorenzo Noci · Sotiris Anagnostidis · Luca Biggio · Antonio Orvieto · Sidak Pal Singh · Aurelien Lucchi -
2022 Poster: Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution »
Antonio Orvieto · Simon Lacoste-Julien · Nicolas Loizou -
2021 : Empirics on the expressiveness of Randomized Signature »
Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto -
2021 Poster: Rethinking the Variational Interpretation of Accelerated Optimization Methods »
Peiyuan Zhang · Antonio Orvieto · Hadi Daneshmand -
2021 Poster: On the Second-order Convergence Properties of Random Search Methods »
Aurelien Lucchi · Antonio Orvieto · Adamos Solomou -
2020 Poster: Batch normalization provably avoids ranks collapse for randomly initialised deep networks »
Hadi Daneshmand · Jonas Kohler · Francis Bach · Thomas Hofmann · Aurelien Lucchi -
2020 Poster: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2020 Oral: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2019 : Spotlight talks »
Paul Grigas · Zhewei Yao · Aurelien Lucchi · Si Yi Meng -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: Continuous-time Models for Stochastic Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: A Domain Agnostic Measure for Monitoring and Evaluating GANs »
Paulina Grnarova · Kfir Y. Levy · Aurelien Lucchi · Nathanael Perraudin · Ian Goodfellow · Thomas Hofmann · Andreas Krause -
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann -
2016 Poster: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy »
Aryan Mokhtari · Hadi Daneshmand · Aurelien Lucchi · Thomas Hofmann · Alejandro Ribeiro -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams