Timezone: »

 
Poster
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen · Jiamin Ni · Songtao Lu · Xiaodong Cui · Pin-Yu Chen · Xiao Sun · Naigang Wang · Swagath Venkataramani · Vijayalakshmi (Viji) Srinivasan · Wei Zhang · Kailash Gopalakrishnan

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #757

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms are expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing compression methods do not scale well to large scale distributed systems (due to gradient build-up) and / or lack evaluations in large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleComp), that (i) leverages similarity in the gradient distribution amongst learners to provide a commutative compressor and keep communication cost constant to worker number and (ii) includes low-pass filter in local gradient accumulations to mitigate the impacts of large batch size training and significantly improve scalability. Using theoretical analysis, we show that ScaleComp provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleComp has small overheads, directly reduces gradient traffic and provides high compression rates (70-150X) and excellent scalability (up to 64-80 learners and 10X larger batch sizes over normal training) across a wide range of applications (image, language, and speech) without significant accuracy loss.

Author Information

Chia-Yu Chen (IBM research)

my research areas focus on: accelerator architecture compiler design and library development machine learning and neural network VLSI and nano device

Jiamin Ni (IBM)
Songtao Lu (IBM Research)
Xiaodong Cui (IBM T. J. Watson Research Center)
Pin-Yu Chen (IBM Research AI)
Xiao Sun (IBM Thomas J. Watson Research Center)
Naigang Wang (IBM T. J. Watson Research Center)
Swagath Venkataramani (IBM Research)
Vijayalakshmi (Viji) Srinivasan (IBM TJ Watson)
Wei Zhang (IBM T.J.Watson Research Center)

BE Beijing Univ of Technology 2005 MSc Technical University of Denmark 2008 PhD University of Wisconsin, Madison 2013 All in computer science Published papers in ASPLOS, OOPSLA, OSDI, PLDI, IJCAI, ICDM, NIPS

Kailash Gopalakrishnan (IBM Research)

More from the Same Authors