Timezone: »
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects of real-world behavior. First, the ‘spectral gap’ of the communication graph is not predictive of its empirical performance in (deep) learning. Second, current theory does not explain that collaboration enables larger learning rates than training alone. In fact, it prescribes smaller learning rates, which further decrease as graphs become larger, failing to explain convergence in infinite graphs. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. We quantify how the graph topology influences convergence in a quadratic toy problem and provide theoretical results for general smooth and (strongly) convex objectives. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
Author Information
Thijs Vogels (EPFL)
Hadrien Hendrikx (EPFL)
Martin Jaggi (EPFL)
More from the Same Authors
-
2021 : Interpreting Language Models Through Knowledge Graph Extraction »
Vinitra Swamy · Angelika Romanou · Martin Jaggi -
2021 : Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation »
Futong Liu · Tao Lin · Martin Jaggi -
2021 : Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation »
Futong Liu · Tao Lin · Martin Jaggi -
2021 : WAFFLE: Weighted Averaging for Personalized Federated Learning »
Martin Beaussart · Mary-Anne Hartley · Martin Jaggi -
2022 : Data-heterogeneity-aware Mixing for Decentralized Learning »
Yatin Dandi · Anastasiia Koloskova · Martin Jaggi · Sebastian Stich -
2022 : Decentralized Stochastic Optimization with Client Sampling »
Ziwei Liu · Anastasiia Koloskova · Martin Jaggi · Tao Lin -
2022 : Towards Provably Personalized Federated Learning via Threshold-Clustering of Similar Clients »
Mariel A Werner · Lie He · Sai Praneeth Karimireddy · Michael Jordan · Martin Jaggi -
2022 : Asynchronous speedup in decentralized optimization »
Mathieu Even · Hadrien Hendrikx · Laurent Massoulié -
2022 : Diversity through Disagreement for Better Transferability »
Matteo Pagliardini · Martin Jaggi · François Fleuret · Sai Praneeth Karimireddy -
2023 Poster: MultiMoDN—Multimodal, Multi-Task, Interpretable Modular Networks »
Vinitra Swamy · Malika Satayeva · Jibril Frej · Thierry Bossy · Thijs Vogels · Martin Jaggi · Tanja Käser · Mary-Anne Hartley -
2023 Poster: Hardware-Efficient Transformer Training via Piecewise Affine Operations »
Atli Kosson · Martin Jaggi -
2023 Poster: Faster Causal Attention Over Large Sequences Through Sparse Flash Attention »
Matteo Pagliardini · Daniele Paliotta · Martin Jaggi · François Fleuret -
2023 Poster: Collaborative Learning via Prediction Consensus »
Dongyang Fan · Celestine Mendler-Dünner · Martin Jaggi -
2023 Poster: Random-Access Infinite Context Length for Transformers »
Amirkeivan Mohtashami · Martin Jaggi -
2022 : Scalable Collaborative Learning via Representation Sharing »
Frédéric Berdoz · Abhishek Singh · Martin Jaggi · Ramesh Raskar -
2022 Poster: Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning »
Anastasiia Koloskova · Sebastian Stich · Martin Jaggi -
2022 Poster: FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings »
Jean Ogier du Terrail · Samy-Safwan Ayed · Edwige Cyffers · Felix Grimberg · Chaoyang He · Regis Loeb · Paul Mangold · Tanguy Marchand · Othmane Marfoq · Erum Mushtaq · Boris Muzellec · Constantin Philippenko · Santiago Silva · Maria Teleńczuk · Shadi Albarqouni · Salman Avestimehr · Aurélien Bellet · Aymeric Dieuleveut · Martin Jaggi · Sai Praneeth Karimireddy · Marco Lorenzi · Giovanni Neglia · Marc Tommasi · Mathieu Andreux -
2021 : [S11] Interpreting Language Models Through Knowledge Graph Extraction »
Vinitra Swamy · Angelika Romanou · Martin Jaggi -
2021 : Q&A with Martin Jaggi »
Martin Jaggi -
2021 : Learning with Strange Gradients, Martin Jaggi »
Martin Jaggi -
2021 Poster: Breaking the centralized barrier for cross-device federated learning »
Sai Praneeth Karimireddy · Martin Jaggi · Satyen Kale · Mehryar Mohri · Sashank Reddi · Sebastian Stich · Ananda Theertha Suresh -
2021 Oral: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: RelaySum for Decentralized Deep Learning on Heterogeneous Data »
Thijs Vogels · Lie He · Anastasiia Koloskova · Sai Praneeth Karimireddy · Tao Lin · Sebastian Stich · Martin Jaggi -
2021 Poster: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2020 Poster: Ensemble Distillation for Robust Model Fusion in Federated Learning »
Tao Lin · Lingjing Kong · Sebastian Stich · Martin Jaggi -
2020 Poster: Dual-Free Stochastic Decentralized Optimization with Variance Reduction »
Hadrien Hendrikx · Francis Bach · Laurent Massoulié -
2020 Poster: Practical Low-Rank Communication Compression in Decentralized Deep Learning »
Thijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi -
2020 Poster: Model Fusion via Optimal Transport »
Sidak Pal Singh · Martin Jaggi -
2019 Poster: PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization »
Thijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi -
2019 Poster: Unsupervised Scalable Representation Learning for Multivariate Time Series »
Jean-Yves Franceschi · Aymeric Dieuleveut · Martin Jaggi -
2019 Poster: An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums »
Hadrien Hendrikx · Francis Bach · Laurent Massoulié -
2018 Poster: COLA: Decentralized Linear Learning »
Lie He · Yatao Bian · Martin Jaggi -
2018 Poster: Sparsified SGD with Memory »
Sebastian Stich · Jean-Baptiste Cordonnier · Martin Jaggi -
2018 Poster: Training DNNs with Hybrid Block Floating Point »
Mario Drumond · Tao Lin · Martin Jaggi · Babak Falsafi -
2017 : Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning »
Hadrien Hendrikx -
2017 Poster: Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning »
El Mahdi El-Mhamdi · Rachid Guerraoui · Hadrien Hendrikx · Alexandre Maurer -
2017 Poster: Safe Adaptive Importance Sampling »
Sebastian Stich · Anant Raj · Martin Jaggi -
2017 Spotlight: Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning »
El Mahdi El-Mhamdi · Rachid Guerraoui · Hadrien Hendrikx · Alexandre Maurer -
2017 Spotlight: Safe Adaptive Importance Sampling »
Sebastian Stich · Anant Raj · Martin Jaggi -
2017 Poster: Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees »
Francesco Locatello · Michael Tschannen · Gunnar Ratsch · Martin Jaggi -
2017 Poster: Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems »
Celestine Dünner · Thomas Parnell · Martin Jaggi -
2015 Poster: On the Global Linear Convergence of Frank-Wolfe Optimization Variants »
Simon Lacoste-Julien · Martin Jaggi -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan -
2013 Workshop: Greedy Algorithms, Frank-Wolfe and Friends - A modern perspective »
Martin Jaggi · Zaid Harchaoui · Federico Pierucci