Timezone: »
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the algorithm. Our guarantees are strictly better than the existing analyses, and we also argue that asynchronous SGD outperforms synchronous minibatch SGD in the settings we consider. For our analysis, we introduce a novel recursion based on ``virtual iterates'' and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives.
Author Information
Konstantin Mishchenko (CNRS)
Francis Bach (INRIA - Ecole Normale Superieure)
Francis Bach is a researcher at INRIA, leading since 2011 the SIERRA project-team, which is part of the Computer Science Department at Ecole Normale Supérieure in Paris, France. After completing his Ph.D. in Computer Science at U.C. Berkeley, he spent two years at Ecole des Mines, and joined INRIA and Ecole Normale Supérieure in 2007. He is interested in statistical machine learning, and especially in convex optimization, combinatorial optimization, sparse methods, kernel-based learning, vision and signal processing. He gave numerous courses on optimization in the last few years in summer schools. He has been program co-chair for the International Conference on Machine Learning in 2015.
Mathieu Even (INRIA)
Blake Woodworth (Inria)
More from the Same Authors
-
2021 : On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics »
Grigory Malinovsky · Konstantin Mishchenko · Peter Richtarik -
2022 Poster: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 : Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass »
Aaron Defazio · Konstantin Mishchenko -
2022 : Asynchronous speedup in decentralized optimization »
Mathieu Even · Hadrien Hendrikx · Laurent Massoulié -
2023 Poster: (S)GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability »
Mathieu Even · Scott Pesme · Suriya Gunasekar · Nicolas Flammarion -
2023 Poster: On the impact of activation and normalization in obtaining isometric embeddings at initialization »
Amir Joudaki · Hadi Daneshmand · Francis Bach -
2023 Poster: Differentiable Clustering with Perturbed Spanning Forests »
Lawrence Stewart · Francis Bach · Felipe Llinares-Lopez · Quentin Berthet -
2023 Poster: Regularization properties of adversarially-trained linear regression »
Antonio Ribeiro · Dave Zachariah · Francis Bach · Thomas Schön -
2023 Poster: DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method »
Ahmed Khaled Ragab Bayoumi · Konstantin Mishchenko · Chi Jin -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 : Asynchronous Optimization: Delays, Stability, and the Impact of Data Heterogeneity »
Konstantin Mishchenko -
2022 Poster: Variational inference via Wasserstein gradient flows »
Marc Lambert · Sinho Chewi · Francis Bach · Silvère Bonnabel · Philippe Rigollet -
2022 Poster: Towards Optimal Communication Complexity in Distributed Non-Convex Optimization »
Kumar Kshitij Patel · Lingxiao Wang · Blake Woodworth · Brian Bullins · Nati Srebro -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization »
Benjamin Dubois-Taine · Francis Bach · Quentin Berthet · Adrien Taylor -
2022 Poster: On Sample Optimality in Personalized Collaborative and Federated Learning »
Mathieu Even · Laurent Massoulié · Kevin Scaman -
2022 Poster: Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging »
Edwige Cyffers · Mathieu Even · Aurélien Bellet · Laurent Massoulié -
2022 Poster: Active Labeling: Streaming Stochastic Gradients »
Vivien Cabannes · Francis Bach · Vianney Perchet · Alessandro Rudi -
2021 Poster: An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning »
Blake Woodworth · Nathan Srebro -
2021 Poster: A Stochastic Newton Algorithm for Distributed Convex Optimization »
Brian Bullins · Kshitij Patel · Ohad Shamir · Nathan Srebro · Blake Woodworth -
2021 Oral: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2020 : Francis Bach - Where is Machine Learning Going? »
Francis Bach -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Random Reshuffling: Simple Analysis with Vast Improvements »
Konstantin Mishchenko · Ahmed Khaled Ragab Bayoumi · Peter Richtarik -
2020 Poster: Minibatch vs Local SGD for Heterogeneous Distributed Learning »
Blake Woodworth · Kumar Kshitij Patel · Nati Srebro -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2019 : Spotlight talks »
Damien Scieur · Konstantin Mishchenko · Rohan Anil -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2018 Poster: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Spotlight: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Poster: The Everlasting Database: Statistical Validity at a Fair Price »
Blake Woodworth · Vitaly Feldman · Saharon Rosset · Nati Srebro -
2018 Poster: SEGA: Variance Reduction via Gradient Sketching »
Filip Hanzely · Konstantin Mishchenko · Peter Richtarik -
2017 : Concluding remarks »
Francis Bach · Benjamin Guedj · Pascal Germain -
2017 : Neil Lawrence, Francis Bach and François Laviolette »
Neil Lawrence · Francis Bach · Francois Laviolette -
2017 : Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance »
Francis Bach -
2017 : Overture »
Benjamin Guedj · Francis Bach · Pascal Germain -
2017 Workshop: (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights »
Benjamin Guedj · Pascal Germain · Francis Bach -
2017 Poster: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Oral: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Poster: Nonlinear Acceleration of Stochastic Algorithms »
Damien Scieur · Francis Bach · Alexandre d'Aspremont -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Poster: Integration Methods and Optimization Algorithms »
Damien Scieur · Vincent Roulet · Francis Bach · Alexandre d'Aspremont -
2016 : Francis Bach. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. »
Francis Bach -
2016 : Submodular Functions: from Discrete to Continuous Domains »
Francis Bach -
2016 Poster: Tight Complexity Bounds for Optimizing Composite Objectives »
Blake Woodworth · Nati Srebro -
2016 Tutorial: Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach