Timezone: »
Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet it is also known to be slow relative to steepest descent. Recently, variance reduction techniques such as SVRG and SAGA have been proposed to overcome this weakness. With asymptotically vanishing variance, a constant step size can be maintained, resulting in geometric convergence rates. However, these methods are either based on occasional computations of full gradients at pivot points (SVRG), or on keeping per data point corrections in memory (SAGA). This has the disadvantage that one cannot employ these methods in a streaming setting and that speed-ups relative to SGD may need a certain number of epochs in order to materialize. This paper investigates a new class of algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points. While not meant to be offering advantages in an asymptotic setting, there are significant benefits in the transient optimization phase, in particular in a streaming or single-epoch setting. We investigate this family of algorithms in a thorough analysis and show supporting experimental results. As a side-product we provide a simple and unified proof technique for a broad class of variance reduction algorithms.
Author Information
Thomas Hofmann (ETH Zurich)
Aurelien Lucchi (ETH Zurich)
Simon Lacoste-Julien (INRIA)
Brian McWilliams (ETH Zurich)
More from the Same Authors
-
2021 Spotlight: Precise characterization of the prior predictive distribution of deep ReLU networks »
Lorenzo Noci · Gregor Bachmann · Kevin Roth · Sebastian Nowozin · Thomas Hofmann -
2022 : Cosmology from Galaxy Redshift Surveys with PointNet »
Sotiris Anagnostidis · Arne Thomsen · Alexandre Refregier · Tomasz Kacprzak · Luca Biggio · Thomas Hofmann · Tilman Tröster -
2022 : Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning »
Sanghwan Kim · Lorenzo Noci · Antonio Orvieto · Thomas Hofmann -
2022 Poster: OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters »
Piera Riccio · Bill Psomas · Francesco Galati · Francisco Escolano · Thomas Hofmann · Nuria Oliver -
2021 Poster: Analytic Insights into Structure and Rank of Neural Network Hessian Maps »
Sidak Pal Singh · Gregor Bachmann · Thomas Hofmann -
2021 Poster: Precise characterization of the prior predictive distribution of deep ReLU networks »
Lorenzo Noci · Gregor Bachmann · Kevin Roth · Sebastian Nowozin · Thomas Hofmann -
2021 Poster: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect »
Lorenzo Noci · Kevin Roth · Gregor Bachmann · Sebastian Nowozin · Thomas Hofmann -
2020 Poster: Batch normalization provably avoids ranks collapse for randomly initialised deep networks »
Hadi Daneshmand · Jonas Kohler · Francis Bach · Thomas Hofmann · Aurelien Lucchi -
2020 Poster: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Spotlight: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Poster: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2020 Oral: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2019 : Spotlight talks »
Paul Grigas · Zhewei Yao · Aurelien Lucchi · Si Yi Meng -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: Continuous-time Models for Stochastic Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: A Domain Agnostic Measure for Monitoring and Evaluating GANs »
Paulina Grnarova · Kfir Y. Levy · Aurelien Lucchi · Nathanael Perraudin · Ian Goodfellow · Thomas Hofmann · Andreas Krause -
2018 Poster: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Spotlight: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Poster: Deep State Space Models for Unconditional Word Generation »
Florian Schmidt · Thomas Hofmann -
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann -
2016 Poster: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy »
Aryan Mokhtari · Hadi Daneshmand · Aurelien Lucchi · Thomas Hofmann · Alejandro Ribeiro -
2015 Poster: On the Global Linear Convergence of Frank-Wolfe Optimization Variants »
Simon Lacoste-Julien · Martin Jaggi -
2015 Poster: Barrier Frank-Wolfe for Marginal Inference »
Rahul G Krishnan · Simon Lacoste-Julien · David Sontag -
2015 Poster: Rethinking LDA: Moment Matching for Discrete ICA »
Anastasia Podosinnikova · Francis Bach · Simon Lacoste-Julien -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan -
2014 Poster: Fast and Robust Least Squares Estimation in Corrupted Linear Models »
Brian McWilliams · Gabriel Krummenacher · Mario Lucic · Joachim M Buhmann -
2014 Spotlight: Fast and Robust Least Squares Estimation in Corrupted Linear Models »
Brian McWilliams · Gabriel Krummenacher · Mario Lucic · Joachim M Buhmann -
2013 Poster: Correlated random features for fast semi-supervised learning »
Brian McWilliams · David Balduzzi · Joachim M Buhmann