Timezone: »
Despite the development of numerous adaptive optimizers, tuning the learning rate of stochastic gradient methods remains a major roadblock to obtaining good practical performance in machine learning. Rather than changing the learning rate at each iteration, we propose an approach that automates the most common hand-tuning heuristic: use a constant learning rate until "progress stops," then drop. We design an explicit statistical test that determines when the dynamics of stochastic gradient descent reach a stationary distribution. This test can be performed easily during training, and when it fires, we decrease the learning rate by a constant multiplicative factor. Our experiments on several deep learning tasks demonstrate that this statistical adaptive stochastic approximation (SASA) method can automatically find good learning rate schedules and match the performance of hand-tuned methods using default settings of its parameters. The statistical testing helps to control the variance of this procedure and improves its robustness.
Author Information
Hunter Lang (Microsoft Research)
Lin Xiao (Microsoft Research)
Pengchuan Zhang (Microsoft Research)
More from the Same Authors
-
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2022 : PEST: Combining Parameter-Efficient Fine-Tuning with Self-Training and Co-Training »
Hunter Lang · Monica Agrawal · Yoon Kim · David Sontag -
2022 Poster: Training Subset Selection for Weak Supervision »
Hunter Lang · Aravindan Vijayaraghavan · David Sontag -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2019 Poster: A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks »
Hadi Salman · Greg Yang · Huan Zhang · Cho-Jui Hsieh · Pengchuan Zhang -
2019 Poster: Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers »
Hadi Salman · Jerry Li · Ilya Razenshteyn · Pengchuan Zhang · Huan Zhang · Sebastien Bubeck · Greg Yang -
2019 Spotlight: Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers »
Hadi Salman · Jerry Li · Ilya Razenshteyn · Pengchuan Zhang · Huan Zhang · Sebastien Bubeck · Greg Yang -
2019 Poster: A Stochastic Composite Gradient Method with Incremental Variance Reduction »
Junyu Zhang · Lin Xiao -
2019 Poster: Understanding the Role of Momentum in Stochastic Gradient Methods »
Igor Gitman · Hunter Lang · Pengchuan Zhang · Lin Xiao -
2019 Invited Talk: Test of Time: Dual Averaging Method for Regularized Stochastic Learning and Online Optimization »
Lin Xiao -
2018 Poster: Learning SMaLL Predictors »
Vikas Garg · Ofer Dekel · Lin Xiao -
2018 Poster: Coupled Variational Bayes via Optimization Embedding »
Bo Dai · Hanjun Dai · Niao He · Weiyang Liu · Zhen Liu · Jianshu Chen · Lin Xiao · Le Song -
2018 Poster: Turbo Learning for CaptionBot and DrawingBot »
Qiuyuan Huang · Pengchuan Zhang · Dapeng Wu · Lei Zhang -
2017 Poster: Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes »
Jianshu Chen · Chong Wang · Lin Xiao · Ji He · Lihong Li · Li Deng -
2015 Poster: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture »
Jianshu Chen · Ji He · Yelong Shen · Lin Xiao · Xiaodong He · Jianfeng Gao · Xinying Song · Li Deng -
2014 Poster: An Accelerated Proximal Coordinate Gradient Method »
Qihang Lin · Zhaosong Lu · Lin Xiao -
2012 Session: Oral Session 3 »
Lin Xiao -
2009 Poster: Dual Averaging Method for Regularized Stochastic Learning and Online Optimization »
Lin Xiao