Timezone: »
Poster
Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Shuai Zheng · Ziyue Huang · James Kwok
Thu Dec 12 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #211
Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in
using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction in communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with $46\%$ less wall clock time.
Author Information
Shuai Zheng (Hong Kong University of Science and Technology / Amazon Web Services)
Ziyue Huang (Hong Kong University of Science and Technology)
James Kwok (Hong Kong University of Science and Technology)
More from the Same Authors
-
2021 Spotlight: TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation »
Haoang Chi · Feng Liu · Wenjing Yang · Long Lan · Tongliang Liu · Bo Han · William Cheung · James Kwok -
2023 Poster: Efficient Hyper-parameter Optimization with Cubic Regularization »
Zhenqian Shen · Hansi Yang · Yong Li · James Kwok · Quanming Yao -
2023 Poster: Nonparametric Teaching for Multiple Learners »
Chen Zhang · Xiaofeng Cao · Weiyang Liu · Ivor Tsang · James Kwok -
2022 Poster: Multi-Objective Deep Learning with Adaptive Reference Vectors »
Weiyu Chen · James Kwok -
2021 Poster: Effective Meta-Regularization by Kernelized Proximal Regularization »
Weisen Jiang · James Kwok · Yu Zhang -
2021 Poster: TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation »
Haoang Chi · Feng Liu · Wenjing Yang · Long Lan · Tongliang Liu · Bo Han · William Cheung · James Kwok -
2020 Poster: Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network »
Lifeng Shen · Zhuocong Li · James Kwok -
2020 Poster: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS »
Han Shi · Renjie Pi · Hang Xu · Zhenguo Li · James Kwok · Tong Zhang -
2020 Poster: CSER: Communication-efficient SGD with Error Reset »
Cong Xie · Shuai Zheng · Sanmi Koyejo · Indranil Gupta · Mu Li · Haibin Lin -
2019 Poster: Normalization Helps Training of Quantized LSTM »
Lu Hou · Jinhua Zhu · James Kwok · Fei Gao · Tao Qin · Tie-Yan Liu -
2018 Poster: Scalable Robust Matrix Factorization with Nonconvex Loss »
Quanming Yao · James Kwok -
2015 Poster: Fast Second Order Stochastic Backpropagation for Variational Inference »
Kai Fan · Ziteng Wang · Jeff Beck · James Kwok · Katherine Heller -
2012 Poster: Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification »
Wei Bi · James Kwok -
2009 Poster: Accelerated Gradient Methods for Stochastic Optimization and Online Learning »
Chonghai Hu · James Kwok · Weike Pan