Timezone: »
Decentralized Stochastic Optimization with Client Sampling
Ziwei Liu · Anastasiia Koloskova · Martin Jaggi · Tao Lin
Event URL: https://openreview.net/forum?id=PxO6WDSnQv »
Decentralized optimization is a key setting toward enabling data privacy and on-device learning over networks.Existing research primarily focuses on distributing the objective function across $n$ nodes/clients, lagging behind the real-world challenges such as i) node availability---not all $n$ nodes are always available during the optimization---and ii) slow information propagation (caused by a large number of nodes $n$). In this work, we study Decentralized Stochastic Gradient Descent (D-SGD) with node subsampling, i.e. when only $s~(s \leq n)$ nodes are randomly sampled out of $n$ nodes per iteration.We provide the theoretical convergence rates in smooth (convex and non-convex) problems with heterogeneous (non-identically distributed data) functions.Our theoretical results capture the effect of node subsampling and choice of the topology on the sampled nodes, through a metric termed \emph{the expected consensus rate}.On a number of common topologies, including ring and torus, we theoretically and empirically demonstrate the effectiveness of such a metric.
Decentralized optimization is a key setting toward enabling data privacy and on-device learning over networks.Existing research primarily focuses on distributing the objective function across $n$ nodes/clients, lagging behind the real-world challenges such as i) node availability---not all $n$ nodes are always available during the optimization---and ii) slow information propagation (caused by a large number of nodes $n$). In this work, we study Decentralized Stochastic Gradient Descent (D-SGD) with node subsampling, i.e. when only $s~(s \leq n)$ nodes are randomly sampled out of $n$ nodes per iteration.We provide the theoretical convergence rates in smooth (convex and non-convex) problems with heterogeneous (non-identically distributed data) functions.Our theoretical results capture the effect of node subsampling and choice of the topology on the sampled nodes, through a metric termed \emph{the expected consensus rate}.On a number of common topologies, including ring and torus, we theoretically and empirically demonstrate the effectiveness of such a metric.
Author Information
Ziwei Liu (EPFL - EPF Lausanne)
Anastasiia Koloskova (EPFL)
Martin Jaggi (EPFL)
Tao Lin (Westlake University)
More from the Same Authors
-
2021 : Interpreting Language Models Through Knowledge Graph Extraction »
Vinitra Swamy · Angelika Romanou · Martin Jaggi -
2021 : Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation »
Futong Liu · Tao Lin · Martin Jaggi -
2021 : Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation »
Futong Liu · Tao Lin · Martin Jaggi -
2021 : WAFFLE: Weighted Averaging for Personalized Federated Learning »
Martin Beaussart · Mary-Anne Hartley · Martin Jaggi -
2022 : Data-heterogeneity-aware Mixing for Decentralized Learning »
Yatin Dandi · Anastasiia Koloskova · Martin Jaggi · Sebastian Stich -
2022 : Towards Provably Personalized Federated Learning via Threshold-Clustering of Similar Clients »
Mariel A Werner · Lie He · Sai Praneeth Karimireddy · Michael Jordan · Martin Jaggi -
2022 : Diversity through Disagreement for Better Transferability »
Matteo Pagliardini · Martin Jaggi · François Fleuret · Sai Praneeth Karimireddy -
2023 Poster: Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy »
Anastasiia Koloskova · Ryan McKenna · Zachary Charles · John Rush · H. Brendan McMahan -
2023 Poster: MultiMoDN—Multimodal, Multi-Task, Interpretable Modular Networks »
Vinitra Swamy · Malika Satayeva · Jibril Frej · Thierry Bossy · Thijs Vogels · Martin Jaggi · Tanja Käser · Mary-Anne Hartley -
2023 Poster: Hardware-Efficient Transformer Training via Piecewise Affine Operations »
Atli Kosson · Martin Jaggi -
2023 Poster: Faster Causal Attention Over Large Sequences Through Sparse Flash Attention »
Matteo Pagliardini · Daniele Paliotta · Martin Jaggi · François Fleuret -
2023 Poster: Collaborative Learning via Prediction Consensus »
Dongyang Fan · Celestine Mendler-Dünner · Martin Jaggi -
2023 Poster: Random-Access Infinite Context Length for Transformers »
Amirkeivan Mohtashami · Martin Jaggi -
2022 Panel: Panel 4B-2: Decentralized Training of… & Sharper Convergence Guarantees… »
Anastasiia Koloskova · Tianyi Zhang -
2022 Spotlight: Decentralized Local Stochastic Extra-Gradient for Variational Inequalities »
Aleksandr Beznosikov · Pavel Dvurechenskii · Anastasiia Koloskova · Valentin Samokhin · Sebastian Stich · Alexander Gasnikov -
2022 : Scalable Collaborative Learning via Representation Sharing »
Frédéric Berdoz · Abhishek Singh · Martin Jaggi · Ramesh Raskar -
2022 Poster: Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning »
Anastasiia Koloskova · Sebastian Stich · Martin Jaggi -
2022 Poster: FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings »
Jean Ogier du Terrail · Samy-Safwan Ayed · Edwige Cyffers · Felix Grimberg · Chaoyang He · Regis Loeb · Paul Mangold · Tanguy Marchand · Othmane Marfoq · Erum Mushtaq · Boris Muzellec · Constantin Philippenko · Santiago Silva · Maria Teleńczuk · Shadi Albarqouni · Salman Avestimehr · Aurélien Bellet · Aymeric Dieuleveut · Martin Jaggi · Sai Praneeth Karimireddy · Marco Lorenzi · Giovanni Neglia · Marc Tommasi · Mathieu Andreux -
2022 Poster: Beyond spectral gap: the role of the topology in decentralized learning »
Thijs Vogels · Hadrien Hendrikx · Martin Jaggi -
2022 Poster: Decentralized Local Stochastic Extra-Gradient for Variational Inequalities »
Aleksandr Beznosikov · Pavel Dvurechenskii · Anastasiia Koloskova · Valentin Samokhin · Sebastian Stich · Alexander Gasnikov -
2021 : [S11] Interpreting Language Models Through Knowledge Graph Extraction »
Vinitra Swamy · Angelika Romanou · Martin Jaggi -
2021 : Q&A with Martin Jaggi »
Martin Jaggi -
2021 : Learning with Strange Gradients, Martin Jaggi »
Martin Jaggi -
2021 Poster: Breaking the centralized barrier for cross-device federated learning »
Sai Praneeth Karimireddy · Martin Jaggi · Satyen Kale · Mehryar Mohri · Sashank Reddi · Sebastian Stich · Ananda Theertha Suresh -
2021 Poster: RelaySum for Decentralized Deep Learning on Heterogeneous Data »
Thijs Vogels · Lie He · Anastasiia Koloskova · Sai Praneeth Karimireddy · Tao Lin · Sebastian Stich · Martin Jaggi -
2021 Poster: An Improved Analysis of Gradient Tracking for Decentralized Machine Learning »
Anastasiia Koloskova · Tao Lin · Sebastian Stich -
2020 Poster: Ensemble Distillation for Robust Model Fusion in Federated Learning »
Tao Lin · Lingjing Kong · Sebastian Stich · Martin Jaggi -
2020 Poster: Practical Low-Rank Communication Compression in Decentralized Deep Learning »
Thijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi -
2020 Poster: Model Fusion via Optimal Transport »
Sidak Pal Singh · Martin Jaggi -
2019 Poster: PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization »
Thijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi -
2019 Poster: Unsupervised Scalable Representation Learning for Multivariate Time Series »
Jean-Yves Franceschi · Aymeric Dieuleveut · Martin Jaggi -
2018 Poster: COLA: Decentralized Linear Learning »
Lie He · Yatao Bian · Martin Jaggi -
2018 Poster: Sparsified SGD with Memory »
Sebastian Stich · Jean-Baptiste Cordonnier · Martin Jaggi -
2018 Poster: Training DNNs with Hybrid Block Floating Point »
Mario Drumond · Tao Lin · Martin Jaggi · Babak Falsafi -
2017 Poster: Safe Adaptive Importance Sampling »
Sebastian Stich · Anant Raj · Martin Jaggi -
2017 Spotlight: Safe Adaptive Importance Sampling »
Sebastian Stich · Anant Raj · Martin Jaggi -
2017 Poster: Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees »
Francesco Locatello · Michael Tschannen · Gunnar Ratsch · Martin Jaggi -
2017 Poster: Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems »
Celestine Dünner · Thomas Parnell · Martin Jaggi -
2015 Poster: On the Global Linear Convergence of Frank-Wolfe Optimization Variants »
Simon Lacoste-Julien · Martin Jaggi -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan -
2013 Workshop: Greedy Algorithms, Frank-Wolfe and Friends - A modern perspective »
Martin Jaggi · Zaid Harchaoui · Federico Pierucci