Timezone: »
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al.), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al.). So far, no VR methods successfully accelerate these two stepsizes with a convergence guarantee.In this work, we make two contributions:Firstly, we propose two new robust variants of SPS and SLS, called AdaSPS and AdaSLS, which achieve optimal asymptotic rates in both strongly-convex or convex and interpolation or non-interpolation settings, except for the case when we have both strong convexity and non-interpolation. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we propose a novel VR method that can use Polyak stepsizes or line-search to achieve acceleration. When it is equipped with AdaSPS or AdaSLS, the resulting algorithms obtain the optimal ratefor optimizing convex smooth functions. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms.
Author Information
Xiaowen Jiang (CISPA - Helmholtz-Zentrum für Informationssicherheit gGmbH)
A first-year PhD student at CISPA, Saarland University, supervised by Dr. Sebastian U. Stich.
Sebastian Stich (CISPA)
More from the Same Authors
-
2021 : Escaping Local Minima With Stochastic Noise »
Harshvardhan Harshvardhan · Sebastian Stich -
2021 : The Peril of Popular Deep Learning Uncertainty Estimation Methods »
Yehao Liu · Matteo Pagliardini · Tatjana Chavdarova · Sebastian Stich -
2022 : Data-heterogeneity-aware Mixing for Decentralized Learning »
Yatin Dandi · Anastasiia Koloskova · Martin Jaggi · Sebastian Stich -
2022 : Bidirectional Adaptive Communication for Heterogeneous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman · Nikita Ivkin · Sebastian Stich -
2022 : Preserving privacy with PATE for heterogeneous data »
Akshay Dodwadmath · Sebastian Stich -
2023 : Diversity-adjusted adaptive step size »
Parham Yazdkhasti · Xiaowen Jiang · Sebastian Stich -
2023 : Noise Injection Irons Out Local Minima and Saddle Points »
Konstantin Mishchenko · Sebastian Stich -
2023 : On the Convergence of Local SGD Under Third-Order Smoothness and Hessian Similarity »
Ali Zindari · Ruichen Luo · Sebastian Stich -
2023 : Poster Session 2 »
Xiao-Yang Liu · Guy Kornowski · Philipp Dahlinger · Abbas Ehsanfar · Binyamin Perets · David Martinez-Rubio · Sudeep Raja Putta · Runlong Zhou · Connor Lawless · Julian J Stier · Chen Fan · Michal Šustr · James Spann · Jung Hun Oh · Yao Xie · Qi Zhang · Krishna Acharya · Sourabh Medapati · Sharan Vaswani · Sruthi Gorantla · Darshan Chakrabarti · Mohamed Elsayed · Hongyang Zhang · Reza Asad · Viktor Pavlovic · Betty Shea · Georgy Noarov · Chuan He · Daniil Vankov · Taoan Huang · Michael Lu · Anant Mathur · Konstantin Mishchenko · Stanley Wei · Francesco Faccio · Yuchen Zeng · Tianyue Zhang · Chris Junchi Li · Aaron Mishkin · Sina Baharlouei · Chen Xu · Sasha Abramowitz · Sebastian Stich -
2023 Workshop: OPT 2023: Optimization for Machine Learning »
Cristóbal Guzmán · Courtney Paquette · Katya Scheinberg · Aaron Sidford · Sebastian Stich -
2022 Spotlight: Decentralized Local Stochastic Extra-Gradient for Variational Inequalities »
Aleksandr Beznosikov · Pavel Dvurechenskii · Anastasiia Koloskova · Valentin Samokhin · Sebastian Stich · Alexander Gasnikov -
2022 Workshop: OPT 2022: Optimization for Machine Learning »
Courtney Paquette · Sebastian Stich · Quanquan Gu · Cristóbal Guzmán · John Duchi -
2022 Poster: Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning »
Anastasiia Koloskova · Sebastian Stich · Martin Jaggi -
2022 Poster: Decentralized Local Stochastic Extra-Gradient for Variational Inequalities »
Aleksandr Beznosikov · Pavel Dvurechenskii · Anastasiia Koloskova · Valentin Samokhin · Sebastian Stich · Alexander Gasnikov -
2021 : Contributed Talks in Session 1 (Zoom) »
Sebastian Stich · Futong Liu · Abdurakhmon Sadiev · Frederik Benzing · Simon Roburin -
2021 : Opening Remarks to Session 1 »
Sebastian Stich -
2021 Workshop: OPT 2021: Optimization for Machine Learning »
Courtney Paquette · Quanquan Gu · Oliver Hinder · Katya Scheinberg · Sebastian Stich · Martin Takac -
2021 Poster: Breaking the centralized barrier for cross-device federated learning »
Sai Praneeth Karimireddy · Martin Jaggi · Satyen Kale · Mehryar Mohri · Sashank Reddi · Sebastian Stich · Ananda Theertha Suresh -
2021 Poster: RelaySum for Decentralized Deep Learning on Heterogeneous Data »
Thijs Vogels · Lie He · Anastasiia Koloskova · Sai Praneeth Karimireddy · Tao Lin · Sebastian Stich · Martin Jaggi -
2021 Poster: An Improved Analysis of Gradient Tracking for Decentralized Machine Learning »
Anastasiia Koloskova · Tao Lin · Sebastian Stich -
2020 : Closing remarks »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 : Contributed talks in Session 1 (Zoom) »
Sebastian Stich · Laurent Condat · Zhize Li · Ohad Shamir · Tiffany Vlaar · Mohammadi Zaki -
2020 : Live Q&A with Volkan Cevher (Zoom) »
Sebastian Stich -
2020 : Live Q&A with Tong Zhang (Zoom) »
Sebastian Stich -
2020 : Welcome remarks to Session 1 »
Sebastian Stich -
2020 Workshop: OPT2020: Optimization for Machine Learning »
Courtney Paquette · Mark Schmidt · Sebastian Stich · Quanquan Gu · Martin Takac -
2020 : Welcome event (gather.town) »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 Poster: Ensemble Distillation for Robust Model Fusion in Federated Learning »
Tao Lin · Lingjing Kong · Sebastian Stich · Martin Jaggi -
2018 Poster: Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization »
Robert Gower · Filip Hanzely · Peter Richtarik · Sebastian Stich -
2018 Poster: Sparsified SGD with Memory »
Sebastian Stich · Jean-Baptiste Cordonnier · Martin Jaggi -
2017 Poster: Safe Adaptive Importance Sampling »
Sebastian Stich · Anant Raj · Martin Jaggi -
2017 Spotlight: Safe Adaptive Importance Sampling »
Sebastian Stich · Anant Raj · Martin Jaggi