Timezone: »
Poster
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
Mingrui Liu · Zhenxun Zhuang · Yunwen Lei · Chunyang Liao
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup. The main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skipping communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works, and design a communication-efficient gradient clipping algorithm. This algorithm can be run on multiple machines, where each machine employs a gradient clipping scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have $O\left(\frac{1}{N\epsilon^4}\right)$ iteration complexity and $O(\frac{1}{\epsilon^3})$ communication complexity for finding an $\epsilon$-stationary point in the homogeneous data setting, where $N$ is the number of machines. This indicates that our algorithm enjoys linear speedup and reduced communication rounds. Our proof relies on novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory.
Author Information
Mingrui Liu (George Mason University)
Zhenxun Zhuang (Meta)
Yunwen Lei (University of Birmingham)
Chunyang Liao (Texas A&M)
More from the Same Authors
-
2023 Poster: Global Convergence Analysis of Local SGD for One-hidden-layer Convolutional Neural Network without Overparameterization »
Yajie Bao · Amarda Shehu · Mingrui Liu -
2023 Poster: Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds »
Michael Crawshaw · Yajie Bao · Mingrui Liu -
2023 Poster: Toward Better PAC-Bayes Bounds for Uniformly Stable Algorithms »
Sijia Zhou · Yunwen Lei · Ata Kaban -
2023 Poster: Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm »
Jie Hao · Kaiyi Ji · Mingrui Liu -
2022 Spotlight: A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks »
Mingrui Liu · Zhenxun Zhuang · Yunwen Lei · Chunyang Liao -
2022 Spotlight: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2022 Poster: Robustness to Unbounded Smoothness of Generalized SignSGD »
Michael Crawshaw · Mingrui Liu · Francesco Orabona · Wei Zhang · Zhenxun Zhuang -
2022 Poster: Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks »
Yunwen Lei · Rong Jin · Yiming Ying -
2022 Poster: Stability and Generalization for Markov Chain Stochastic Gradient Methods »
Puyu Wang · Yunwen Lei · Yiming Ying · Ding-Xuan Zhou -
2022 Poster: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2021 Poster: Generalization Guarantee of SGD for Pairwise Learning »
Yunwen Lei · Mingrui Liu · Yiming Ying -
2020 Poster: Improved Schemes for Episodic Memory-based Lifelong Learning »
Yunhui Guo · Mingrui Liu · Tianbao Yang · Tajana S Rosing -
2020 Spotlight: Improved Schemes for Episodic Memory-based Lifelong Learning »
Yunhui Guo · Mingrui Liu · Tianbao Yang · Tajana S Rosing -
2020 Poster: A Decentralized Parallel Algorithm for Training Generative Adversarial Nets »
Mingrui Liu · Wei Zhang · Youssef Mroueh · Xiaodong Cui · Jarret Ross · Tianbao Yang · Payel Das -
2019 : Coffee/Poster session 2 »
Xingyou Song · Puneet Mangla · David Salinas · Zhenxun Zhuang · Leo Feng · Shell Xu Hu · Raul Puri · Wesley Maddox · Aniruddh Raghu · Prudencio Tossou · Mingzhang Yin · Ishita Dasgupta · Kangwook Lee · Ferran Alet · Zhen Xu · Jörg Franke · James Harrison · Jonathan Warrell · Guneet Dhillon · Arber Zela · Xin Qiu · Julien Niklas Siems · Russell Mendonca · Louis Schlessinger · Jeffrey Li · Georgiana Manolache · Debojyoti Dutta · Lucas Glass · Abhishek Singh · Gregor Koehler