Timezone: »
We propose a new stochastic gradient method that uses recorded past loss values to compute adaptive stepsizes. Our starting point is to show that the SP (Stochastic Polyak) method directly exploits interpolated models. That is, SP is a subsampled Newton-Raphson method applied to solving certain interpolation equations. These interpolation equations only hold for models that interpolate the data. We then use this viewpoint to develop a new variant of the SP method that converges without interpolation called MOTAPS. The MOTAPS method uses n auxiliary variables, one for each data point, that track the loss value for each data point. These auxiliary variables and the loss values are then used to set the step size.
We provide a global convergence theory for MOTAPS by showing that it can be interpreted as a special variant of online SGD. We also perform several numerical experiments on convex learning problems, and non-convex learning problem based on image classification and language translation. In all of our tasks we show that MOTAPS is competitive with the relevant baseline method.
Author Information
Robert Gower (Flatiron Institute)
Aaron Defazio (Facebook AI Research)
Mike Rabbat (Facebook FAIR)
More from the Same Authors
-
2022 : Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass »
Aaron Defazio · Konstantin Mishchenko -
2022 : A Stochastic Prox-Linear Method for CVaR Minimization »
Si Yi Meng · Vasileios Charisopoulos · Robert Gower -
2022 : Using quadratic equations for overparametrized models »
Shuang Li · William Swartworth · Martin Takac · Deanna Needell · Robert Gower -
2022 : PSPS: Preconditioned Stochastic Polyak Step-size method for badly scaled data »
Farshed Abdukhakimov · Chulu Xiang · Dmitry Kamzolov · Robert Gower · Martin Takac -
2022 : Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning »
John Nguyen · Jianyu Wang · Kshitiz Malik · Maziar Sanjabi · Mike Rabbat -
2022 : The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning »
Chuan Guo · Kamalika Chaudhuri · Pierre STOCK · Mike Rabbat -
2023 Poster: Mechanic: A Learning Rate Tuner »
Ashok Cutkosky · Aaron Defazio · Harsh Mehta -
2023 Poster: Provable convergence guarantees for black-box variational inference »
Justin Domke · Robert Gower · Guillaume Garrigos -
2023 Poster: Variational Inference with Gaussian Score Matching »
Chirag Modi · Robert Gower · Charles Margossian · Yuling Yao · David Blei · Lawrence Saul -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 : Contributed Talks 2 »
Quanquan Gu · Aaron Defazio · Jiajin Li -
2021 : Poster Session 2 (gather.town) »
Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2020 Poster: MRI Banding Removal via Adversarial Training »
Aaron Defazio · Tullie Murrell · Michael Recht -
2019 Poster: Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning »
Mahmoud Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat -
2019 Poster: On the Ineffectiveness of Variance Reduced Optimization for Deep Learning »
Aaron Defazio · Leon Bottou -
2019 Poster: RSN: Randomized Subspace Newton »
Robert Gower · Dmitry Kovalev · Felix Lieder · Peter Richtarik -
2019 Poster: Towards closing the gap between the theory and practice of SVRG »
Othmane Sebbouh · Nidham Gazagnadou · Samy Jelassi · Francis Bach · Robert Gower -
2019 Poster: On the Curved Geometry of Accelerated Optimization »
Aaron Defazio