Timezone: »

Stochastic Polyak Stepsize with a Moving Target
Robert Gower · Aaron Defazio · Mike Rabbat

We propose a new stochastic gradient method that uses recorded past loss values to compute adaptive stepsizes. Our starting point is to show that the SP (Stochastic Polyak) method directly exploits interpolated models. That is, SP is a subsampled Newton-Raphson method applied to solving certain interpolation equations. These interpolation equations only hold for models that interpolate the data. We then use this viewpoint to develop a new variant of the SP method that converges without interpolation called MOTAPS. The MOTAPS method uses n auxiliary variables, one for each data point, that track the loss value for each data point. These auxiliary variables and the loss values are then used to set the step size.

We provide a global convergence theory for MOTAPS by showing that it can be interpreted as a special variant of online SGD. We also perform several numerical experiments on convex learning problems, and non-convex learning problem based on image classification and language translation. In all of our tasks we show that MOTAPS is competitive with the relevant baseline method.

Author Information

Robert Gower (Flatiron Institute)
Aaron Defazio (Facebook AI Research)
Mike Rabbat (Facebook FAIR)

More from the Same Authors

  • 2021 : Poster Session 2 (gather.town) »
    Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov
  • 2020 : Poster Session 2 (gather.town) »
    Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang
  • 2019 Poster: Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning »
    Mahmoud Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat
  • 2019 Poster: RSN: Randomized Subspace Newton »
    Robert Gower · Dmitry Kovalev · Felix Lieder · Peter Richtarik
  • 2019 Poster: Towards closing the gap between the theory and practice of SVRG »
    Othmane Sebbouh · Nidham Gazagnadou · Samy Jelassi · Francis Bach · Robert Gower