Timezone: »
Poster
Communication-efficient Distributed SGD with Sketching
Nikita Ivkin · Daniel Rothchild · Enayat Ullah · Vladimir braverman · Ion Stoica · Raman Arora
Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #81
Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce Sketched-SGD, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients. We show that \ssgd has favorable convergence rates on several classes of functions. When considering all communication -- both of gradients and of updated model weights -- Sketched-SGD reduces the amount of communication required compared to other gradient compression methods from $\mathcal{O}(d)$ or $\mathcal{O}(W)$ to $\mathcal{O}(\log d)$, where $d$ is the number of model parameters and $W$ is the number of workers participating in training. We run experiments on a transformer model, an LSTM, and a residual network, demonstrating up to a 40x reduction in total communication cost with no loss in final model performance. We also show experimentally that Sketched-SGD scales to at least 256 workers without increasing communication cost or degrading model performance.
Author Information
Nikita Ivkin (Amazon)
Daniel Rothchild (UC Berkeley)
Enayat Ullah (Johns Hopkins University)
Vladimir braverman (Johns Hopkins University)
Ion Stoica (UC Berkeley)
Raman Arora (Johns Hopkins University)
More from the Same Authors
-
2020 Poster: Adversarial Robustness of Supervised Sparse Coding »
Jeremias Sulam · Ramchandran Muthukumar · Raman Arora -
2020 Poster: On Convergence and Generalization of Dropout Training »
Poorya Mianjy · Raman Arora -
2019 Poster: Efficient Convex Relaxations for Streaming PCA »
Raman Arora · Teodor Vanislavov Marinov -
2019 Poster: On Differentially Private Graph Sparsification and Applications »
Raman Arora · Jalaj Upadhyay -
2019 Poster: Bandits with Feedback Graphs and Switching Costs »
Raman Arora · Teodor Vanislavov Marinov · Mehryar Mohri -
2018 Poster: Policy Regret in Repeated Games »
Raman Arora · Michael Dinitz · Teodor Vanislavov Marinov · Mehryar Mohri -
2018 Poster: Streaming Kernel PCA with $\tilde{O}(\sqrt{n})$ Random Features »
Enayat Ullah · Poorya Mianjy · Teodor Vanislavov Marinov · Raman Arora -
2018 Poster: The Physical Systems Behind Optimization Algorithms »
Lin Yang · Raman Arora · Vladimir braverman · Tuo Zhao -
2018 Poster: Differentially Private Robust Low-Rank Approximation »
Raman Arora · Vladimir braverman · Jalaj Upadhyay -
2017 Poster: Stochastic Approximation for Canonical Correlation Analysis »
Raman Arora · Teodor Vanislavov Marinov · Poorya Mianjy · Nati Srebro -
2016 Poster: Disease Trajectory Maps »
Peter Schulam · Raman Arora -
2014 Poster: Accelerated Mini-batch Randomized Block Coordinate Descent Method »
Tuo Zhao · Mo Yu · Yiming Wang · Raman Arora · Han Liu -
2013 Poster: Stochastic Optimization of PCA with Capped MSG »
Raman Arora · Andrew Cotter · Nati Srebro -
2009 Poster: On Learning Rotations »
Raman Arora -
2009 Spotlight: On Learning Rotations »
Raman Arora