Timezone: »
Poster
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
Ashok Cutkosky · Harsh Mehta
We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded $\mathfrak{p}$th moments for some $\mathfrak{p}\in(1,2]$. We then consider the case of second-order smooth losses, which to our knowledge have not been studied in this setting, and again obtain high-probability bounds for any $\mathfrak{p}$. Moreover, our results hold for arbitrary smooth norms, in contrast to the typical SGD analysis which requires a Hilbert space norm. Further, we show that after a suitable "burn-in" period, the objective value will monotonically decrease for every iteration until a critical point is identified, which provides intuition behind the popular practice of learning rate "warm-up'' and also yields a last-iterate guarantee.
Author Information
Ashok Cutkosky (Boston University)
Harsh Mehta (Google Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails »
Sat. Dec 11th 12:20 -- 12:35 AM Room
More from the Same Authors
-
2021 Spotlight: Online Selective Classification with Limited Feedback »
Aditya Gangrade · Anil Kag · Ashok Cutkosky · Venkatesh Saligrama -
2022 Poster: Transformer Memory as a Differentiable Search Index »
Yi Tay · Vinh Tran · Mostafa Dehghani · Jianmo Ni · Dara Bahri · Harsh Mehta · Zhen Qin · Kai Hui · Zhe Zhao · Jai Gupta · Tal Schuster · William Cohen · Donald Metzler -
2021 Poster: Online Selective Classification with Limited Feedback »
Aditya Gangrade · Anil Kag · Ashok Cutkosky · Venkatesh Saligrama -
2021 Poster: Logarithmic Regret from Sublinear Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2018 Poster: Distributed Stochastic Optimization via Adaptive SGD »
Ashok Cutkosky · RĂ³bert Busa-Fekete -
2017 Poster: Stochastic and Adversarial Online Learning without Hyperparameters »
Ashok Cutkosky · Kwabena A Boahen -
2016 Poster: Online Convex Optimization with Unconstrained Domains and Losses »
Ashok Cutkosky · Kwabena A Boahen