Timezone: »
Poster
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Bohang Zhang · Jikai Jin · Cong Fang · Liwei Wang
Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD via introducing a new assumption called $(L_0, L_1)$-smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks. However, their iteration complexities on the problem-dependent parameters are rather pessimistic, and theoretical justification of clipping combined with other crucial techniques, e.g. momentum acceleration, are still lacking. In this paper, we bridge the gap by presenting a general framework to study the clipping algorithms, which also takes momentum methods into consideration.We provide convergence analysis of the framework in both deterministic and stochastic setting, and demonstrate the tightness of our results by comparing them with existing lower bounds. Our results imply that the efficiency of clipping methods will not degenerate even in highly non-smooth regions of the landscape. Experiments confirm the superiority of clipping-based methods in deep learning tasks.
Author Information
Bohang Zhang (Peking University)
Jikai Jin (Peking University)
Cong Fang (Peking University)
Liwei Wang (Peking University)
More from the Same Authors
-
2021 Spotlight: Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning »
Hanzhe Hu · Fangyun Wei · Han Hu · Qiwei Ye · Jinshi Cui · Liwei Wang -
2022 Poster: CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds »
Haiyang Wang · Lihe Ding · Shaocong Dong · Shaoshuai Shi · Aoxue Li · Jianan Li · Zhenguo Li · Liwei Wang -
2022 : Minimax Optimal Kernel Operator Learning via Multilevel Training »
Jikai Jin · Yiping Lu · Jose Blanchet · Lexing Ying -
2022 Panel: Panel 4A-1: Rethinking Lipschitz Neural… & Robust Learning against… »
Yizhen Wang · Bohang Zhang -
2022 Spotlight: Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power »
Binghui Li · Jikai Jin · Han Zhong · John Hopcroft · Liwei Wang -
2022 Spotlight: Lightning Talks 2A-1 »
Caio Kalil Lauand · Ryan Strauss · Yasong Feng · lingyu gu · Alireza Fathollah Pour · Oren Mangoubi · Jianhao Ma · Binghui Li · Hassan Ashtiani · Yongqi Du · Salar Fattahi · Sean Meyn · Jikai Jin · Nisheeth Vishnoi · zengfeng Huang · Junier B Oliva · yuan zhang · Han Zhong · Tianyu Wang · John Hopcroft · Di Xie · Shiliang Pu · Liwei Wang · Robert Qiu · Zhenyu Liao -
2022 Poster: Is $L^2$ Physics Informed Loss Always Suitable for Training Physics Informed Neural Network? »
Chuwei Wang · Shanda Li · Di He · Liwei Wang -
2022 Poster: Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power »
Binghui Li · Jikai Jin · Han Zhong · John Hopcroft · Liwei Wang -
2022 Poster: Your Transformer May Not be as Powerful as You Expect »
Shengjie Luo · Shanda Li · Shuxin Zheng · Tie-Yan Liu · Liwei Wang · Di He -
2022 Poster: Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective »
Bohang Zhang · Du Jiang · Di He · Liwei Wang -
2021 Poster: Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis »
Jikai Jin · Bohang Zhang · Haiyang Wang · Liwei Wang -
2021 Poster: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs »
Han Zhong · Jiayi Huang · Lin Yang · Liwei Wang -
2021 Poster: Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding »
Shengjie Luo · Shanda Li · Tianle Cai · Di He · Dinglan Peng · Shuxin Zheng · Guolin Ke · Liwei Wang · Tie-Yan Liu -
2021 Poster: Towards a Theoretical Framework of Out-of-Distribution Generalization »
Haotian Ye · Chuanlong Xie · Tianle Cai · Ruichen Li · Zhenguo Li · Liwei Wang -
2021 Poster: Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning »
Hanzhe Hu · Fangyun Wei · Han Hu · Qiwei Ye · Jinshi Cui · Liwei Wang -
2020 : Poster Session 1 (gather.town) »
Laurent Condat · Tiffany Vlaar · Ohad Shamir · Mohammadi Zaki · Zhize Li · Guan-Horng Liu · Samuel Horváth · Mher Safaryan · Yoni Choukroun · Kumar Shridhar · Nabil Kahale · Jikai Jin · Pratik Kumar Jawanpuria · Gaurav Kumar Yadav · Kazuki Koyama · Junyoung Kim · Xiao Li · Saugata Purkayastha · Adil Salim · Dighanchal Banerjee · Peter Richtarik · Lakshman Mahto · Tian Ye · Bamdev Mishra · Huikang Liu · Jiajie Zhu -
2020 Poster: Locally Differentially Private (Contextual) Bandits Learning »
Kai Zheng · Tianle Cai · Weiran Huang · Zhenguo Li · Liwei Wang -
2020 Poster: Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot »
Jingtong Su · Yihang Chen · Tianle Cai · Tianhao Wu · Ruiqi Gao · Liwei Wang · Jason Lee -
2020 Poster: RepPoints v2: Verification Meets Regression for Object Detection »
Yihong Chen · Zheng Zhang · Yue Cao · Liwei Wang · Stephen Lin · Han Hu -
2020 Poster: How to Characterize The Landscape of Overparameterized Convolutional Neural Networks »
Yihong Gu · Weizhong Zhang · Cong Fang · Jason Lee · Tong Zhang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Convergence of Adversarial Training in Overparametrized Neural Networks »
Ruiqi Gao · Tianle Cai · Haochuan Li · Cho-Jui Hsieh · Liwei Wang · Jason Lee -
2019 Spotlight: Convergence of Adversarial Training in Overparametrized Neural Networks »
Ruiqi Gao · Tianle Cai · Haochuan Li · Cho-Jui Hsieh · Liwei Wang · Jason Lee -
2019 Poster: Equipping Experts/Bandits with Long-term Memory »
Kai Zheng · Haipeng Luo · Ilias Diakonikolas · Liwei Wang -
2019 Poster: McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds »
Rui (Ray) Zhang · Xingwu Liu · Yuyi Wang · Liwei Wang -
2019 Spotlight: McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds »
Rui (Ray) Zhang · Xingwu Liu · Yuyi Wang · Liwei Wang -
2018 Poster: Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation »
Liwei Wang · Lunjia Hu · Jiayuan Gu · Zhiqiang Hu · Yue Wu · Kun He · John Hopcroft -
2018 Spotlight: Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation »
Liwei Wang · Lunjia Hu · Jiayuan Gu · Zhiqiang Hu · Yue Wu · Kun He · John Hopcroft -
2018 Poster: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator »
Cong Fang · Chris Junchi Li · Zhouchen Lin · Tong Zhang -
2018 Spotlight: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator »
Cong Fang · Chris Junchi Li · Zhouchen Lin · Tong Zhang -
2018 Poster: FRAGE: Frequency-Agnostic Word Representation »
Chengyue Gong · Di He · Xu Tan · Tao Qin · Liwei Wang · Tie-Yan Liu -
2017 Poster: Decoding with Value Networks for Neural Machine Translation »
Di He · Hanqing Lu · Yingce Xia · Tao Qin · Liwei Wang · Tie-Yan Liu -
2017 Poster: Faster and Non-ergodic O(1/K) Stochastic Alternating Direction Method of Multipliers »
Cong Fang · Feng Cheng · Zhouchen Lin -
2017 Poster: The Expressive Power of Neural Networks: A View from the Width »
Zhou Lu · Hongming Pu · Feicheng Wang · Zhiqiang Hu · Liwei Wang -
2016 Poster: Dual Learning for Machine Translation »
Di He · Yingce Xia · Tao Qin · Liwei Wang · Nenghai Yu · Tie-Yan Liu · Wei-Ying Ma -
2013 Poster: Efficient Algorithm for Privately Releasing Smooth Queries »
Ziteng Wang · Kai Fan · Jiaqi Zhang · Liwei Wang -
2012 Poster: Dimensionality Dependent PAC-Bayes Margin Bound »
Chi Jin · Liwei Wang -
2009 Poster: Sufficient Conditions for Agnostic Active Learnable »
Liwei Wang