`

Timezone: »

 
Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent
Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad Harikandeh

We aim to design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise \sigma^2 in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number \kappa, we first prove that T iterations of SGD with Nesterov acceleration and exponentially decreasing step-sizes can achieve a near-optimal \tilde{O}(\exp(-T/\sqrt{\kappa}) + \sigma^2/T) convergence rate. Under a relaxed assumption on the noise, with the same step-size scheme and knowledge of the smoothness, we prove that SGD can achieve an \tilde{O}(\exp(-T/\kappa) + \sigma^2/T) rate. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD converges at the desired rate, but only to a neighborhood of the solution. Next, we use SGD with an offline estimate of the smoothness and prove convergence to the minimizer. However, its convergence is slowed down proportional to the estimation error and we prove a lower-bound justifying this slowdown. Compared to other step-size schemes, we empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.

Author Information

Sharan Vaswani (University of Alberta)
Benjamin Dubois-Taine (CNRS, ENS Paris, Inria)
Reza Babanezhad Harikandeh (Samsung/SAIL)

More from the Same Authors

  • 2021 : Poster Session 1 (gather.town) »
    Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani
  • 2020 : Poster Session 2 (gather.town) »
    Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang
  • 2020 : Contributed talks in Session 2 (Zoom) »
    Martin Takac · Samuel Horváth · Guan-Horng Liu · Nicolas Loizou · Sharan Vaswani
  • 2020 : Contributed Video: Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search), Sharan Vaswani »
    Sharan Vaswani
  • 2020 : Contributed Video: How to make your optimizer generalize better, Sharan Vaswani »
    Sharan Vaswani
  • 2019 : Poster Session »
    Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma
  • 2019 Poster: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates »
    Sharan Vaswani · Aaron Mishkin · Issam Laradji · Mark Schmidt · Gauthier Gidel · Simon Lacoste-Julien