Timezone: »
Poster
Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
Zhiyuan Li · Tianhao Wang · Dingli Yu
We prove the Fast Equilibrium Conjecture proposed by Li et al., (2020), i.e., stochastic gradient descent (SGD) on a scale-invariant loss (e.g., using networks with various normalization schemes) with learning rate $\eta$ and weight decay factor $\lambda$ mixes in function space in $\mathcal{\tilde{O}}(\frac{1}{\lambda\eta})$ steps, under two standard assumptions: (1) the noise covariance matrix is non-degenerate and (2) the minimizers of the loss form a connected, compact and analytic manifold. The analysis uses the framework of Li et al., (2021) and shows that for every $T>0$, the iterates of SGD with learning rate $\eta$ and weight decay factor $\lambda$ on the scale-invariant loss converge in distribution in $\Theta\left(\eta^{-1}\lambda^{-1}(T+\ln(\lambda/\eta))\right)$ iterations as $\eta\lambda\to 0$ while satisfying $\eta \le O(\lambda)\le O(1)$. Moreover, the evolution of the limiting distribution can be described by a stochastic differential equation that mixes to the same equilibrium distribution for every initialization around the manifold of minimizers as $T\to\infty$.
Author Information
Zhiyuan Li (Stanford University)
Tianhao Wang (Yale University)
Dingli Yu (Princeton University)
More from the Same Authors
-
2022 : How Sharpness-Aware Minimization Minimizes Sharpness? »
Kaiyue Wen · Tengyu Ma · Zhiyuan Li -
2022 : How Sharpness-Aware Minimization Minimizes Sharpness? »
Kaiyue Wen · Tengyu Ma · Zhiyuan Li -
2022 Poster: New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound »
Arushi Gupta · Nikunj Saunshi · Dingli Yu · Kaifeng Lyu · Sanjeev Arora -
2022 Poster: Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent »
Zhiyuan Li · Tianhao Wang · Jason Lee · Sanjeev Arora -
2022 Poster: Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets »
Yifei Min · Tianhao Wang · Ruitu Xu · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits »
Jiafan He · Tianhao Wang · Yifei Min · Quanquan Gu -
2022 Poster: Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction »
Kaifeng Lyu · Zhiyuan Li · Sanjeev Arora -
2021 Poster: On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) »
Zhiyuan Li · Sadhika Malladi · Sanjeev Arora -
2021 Poster: Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias »
Kaifeng Lyu · Zhiyuan Li · Runzhe Wang · Sanjeev Arora -
2021 Poster: Variance-Aware Off-Policy Evaluation with Linear Function Approximation »
Yifei Min · Tianhao Wang · Dongruo Zhou · Quanquan Gu -
2021 Poster: Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints »
Tianhao Wang · Dongruo Zhou · Quanquan Gu -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · GaĆ«l Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang