Timezone: »
Poster
Byzantine Stochastic Gradient Descent
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of $m$ machines which allegedly compute stochastic gradients every iteration, an $\alpha$-fraction are Byzantine, and may behave adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $\varepsilon$-approximate minimizers of convex functions in $T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big)$ iterations. In contrast, traditional mini-batch SGD needs $T = O\big( \frac{1}{\varepsilon^2 m} \big)$ iterations, but cannot tolerate Byzantine failures.
Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sample complexity and time complexity.
Author Information
Dan Alistarh (IST Austria)
Zeyuan Allen-Zhu (Microsoft Research)
Jerry Li (Berkeley)
More from the Same Authors
-
2023 Poster: Knowledge Distillation Performs Partial Variance Reduction »
Mher Safaryan · Alexandra Peste · Dan Alistarh -
2023 Poster: ZipLM: Inference-Aware Structured Pruning of Language Models »
Eldar Kurtić · Elias Frantar · Dan Alistarh -
2023 Poster: CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models »
Denis Kuznedelev · Eldar Kurtić · Elias Frantar · Dan Alistarh -
2020 Poster: Scalable Belief Propagation via Relaxed Scheduling »
Vitalii Aksenov · Dan Alistarh · Janne H. Korhonen -
2020 Poster: Adaptive Gradient Quantization for Data-Parallel SGD »
Fartash Faghri · Iman Tabrizian · Ilia Markov · Dan Alistarh · Daniel Roy · Ali Ramezani-Kebrya -
2020 Poster: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression »
Sidak Pal Singh · Dan Alistarh -
2020 Expo Demonstration: Using Sparse Quantization for Efficient Inference on Deep Neural Networks »
Mark J Kurtz · Dan Alistarh · Saša Zelenović -
2019 Poster: On the Convergence Rate of Training Recurrent Neural Networks »
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song -
2019 Poster: What Can ResNet Learn Efficiently, Going Beyond Kernels? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers »
Zeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang -
2019 Poster: Can SGD Learn Recurrent Neural Networks with Provable Generalization? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2018 Poster: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD »
Zeyuan Allen-Zhu -
2018 Poster: The Convergence of Sparsified Gradient Methods »
Dan Alistarh · Torsten Hoefler · Mikael Johansson · Nikola Konstantinov · Sarit Khirirat · Cedric Renggli -
2018 Poster: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: NEON2: Finding Local Minima via First-Order Oracles »
Zeyuan Allen-Zhu · Yuanzhi Li -
2018 Spotlight: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: Is Q-Learning Provably Efficient? »
Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan -
2018 Poster: Spectral Signatures in Backdoor Attacks »
Brandon Tran · Jerry Li · Aleksander Madry -
2018 Poster: The Lingering of Gradients: How to Reuse Gradients Over Time »
Zeyuan Allen-Zhu · David Simchi-Levi · Xinshang Wang -
2017 Poster: QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding »
Dan Alistarh · Demjan Grubic · Jerry Li · Ryota Tomioka · Milan Vojnovic -
2017 Poster: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Poster: Communication-Efficient Distributed Learning of Discrete Distributions »
Ilias Diakonikolas · Elena Grigorescu · Jerry Li · Abhiram Natarajan · Krzysztof Onak · Ludwig Schmidt -
2017 Oral: Communication-Efficient Distributed Learning of Discrete Distributions »
Ilias Diakonikolas · Elena Grigorescu · Jerry Li · Abhiram Natarajan · Krzysztof Onak · Ludwig Schmidt -
2017 Spotlight: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Communication-Efficient Stochastic Gradient Descent, with Applications to Neural Networks »
Dan Alistarh · Demjan Grubic · Jerry Li · Ryota Tomioka · Milan Vojnovic -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan -
2016 Poster: Optimal Black-Box Reductions Between Optimization Objectives »
Zeyuan Allen-Zhu · Elad Hazan -
2016 Poster: Even Faster SVD Decomposition Yet Without Agonizing Pain »
Zeyuan Allen-Zhu · Yuanzhi Li