Timezone: »
Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data. Recent advancements in the field provide several mechanisms for speeding up the training, including {\em compressed communication}, {\em variance reduction} and {\em acceleration}. However, none of these methods is capable of exploiting the inherently rich data-dependent smoothness structure of the local losses beyond standard smoothness constants. In this paper, we argue that when training supervised models, {\em smoothness matrices}---information-rich generalizations of the ubiquitous smoothness constants---can and should be exploited for further dramatic gains, both in theory and practice. In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses. To showcase the power of this tool, we describe how our sparsification technique can be adapted to three distributed optimization algorithms---DCGD, DIANA and ADIANA---yielding significant savings in terms of communication complexity. The new methods always outperform the baselines, often dramatically so.
Author Information
Mher Safaryan (KAUST)
Filip Hanzely (University of Edinburgh)
I am a PhD student of Optimization and Operational research at the University of Edinburgh under the supervision of Peter Richtárik, working on stochastic non-convex optimization.
Peter Richtarik (KAUST)
More from the Same Authors
-
2021 : FedMix: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning »
Elnur Gasanov · Ahmed Khaled Ragab Bayoumi · Samuel Horváth · Peter Richtarik -
2022 Poster: Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques »
Bokun Wang · Mher Safaryan · Peter Richtarik -
2022 : Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation »
Rustem Islamov · Xun Qian · Slavomír Hanzely · Mher Safaryan · Peter Richtarik -
2023 Poster: Knowledge Distillation Performs Partial Variance Reduction »
Mher Safaryan · Alexandra Peste · Dan Alistarh -
2021 Poster: EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback »
Peter Richtarik · Igor Sokolov · Ilyas Fatkhullin -
2021 Poster: Error Compensated Distributed SGD Can Be Accelerated »
Xun Qian · Peter Richtarik · Tong Zhang -
2021 Poster: CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression »
Zhize Li · Peter Richtarik -
2021 Poster: Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks »
Dmitry Kovalev · Elnur Gasanov · Alexander Gasnikov · Peter Richtarik -
2021 Oral: EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback »
Peter Richtarik · Igor Sokolov · Ilyas Fatkhullin -
2020 : Poster Session 1 (gather.town) »
Laurent Condat · Tiffany Vlaar · Ohad Shamir · Mohammadi Zaki · Zhize Li · Guan-Horng Liu · Samuel Horváth · Mher Safaryan · Yoni Choukroun · Kumar Shridhar · Nabil Kahale · Jikai Jin · Pratik Kumar Jawanpuria · Gaurav Kumar Yadav · Kazuki Koyama · Junyoung Kim · Xiao Li · Saugata Purkayastha · Adil Salim · Dighanchal Banerjee · Peter Richtarik · Lakshman Mahto · Tian Ye · Bamdev Mishra · Huikang Liu · Jiajie Zhu -
2015 Poster: Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling »
Zheng Qu · Peter Richtarik · Tong Zhang