Timezone: »
We analyse the learning performance of Distributed Gradient Descent in the context of multi-agent decentralised non-parametric regression with the square loss function when i.i.d. samples are assigned to agents. We show that if agents hold sufficiently many samples with respect to the network size, then Distributed Gradient Descent achieves optimal statistical rates with a number of iterations that scales, up to a threshold, with the inverse of the spectral gap of the gossip matrix divided by the number of samples owned by each agent raised to a problem-dependent power. The presence of the threshold comes from statistics. It encodes the existence of a "big data" regime where the number of required iterations does not depend on the network topology. In this regime, Distributed Gradient Descent achieves optimal statistical rates with the same order of iterations as gradient descent run with all the samples in the network. Provided the communication delay is sufficiently small, the distributed protocol yields a linear speed-up in runtime compared to the single-machine protocol. This is in contrast to decentralised optimisation algorithms that do not exploit statistics and only yield a linear speed-up in graphs where the spectral gap is bounded away from zero. Our results exploit the statistical concentration of quantities held by agents and shed new light on the interplay between statistics and communication in decentralised methods. Bounds are given in the standard non-parametric setting with source/capacity assumptions.
Author Information
Dominic Richards (University of Oxford)
Patrick Rebeschini (University of Oxford)
More from the Same Authors
-
2023 Poster: A Novel Framework for Policy Mirror Descent with General Parametrization and Linear Convergence »
Carlo Alfano · Rui Yuan · Patrick Rebeschini -
2023 Poster: Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes »
Emmeran Johnson · Ciara Pike-Burke · Patrick Rebeschini -
2021 Poster: Implicit Regularization in Matrix Sensing via Mirror Descent »
Fan Wu · Patrick Rebeschini -
2021 Poster: Distributed Machine Learning with Sparse Heterogeneous Data »
Dominic Richards · Sahand Negahban · Patrick Rebeschini -
2021 Poster: Time-independent Generalization Bounds for SGLD in Non-convex Settings »
Tyler Farghly · Patrick Rebeschini -
2021 Poster: On Optimal Interpolation in Linear Regression »
Eduard Oravkin · Patrick Rebeschini -
2020 Poster: A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval »
Fan Wu · Patrick Rebeschini -
2020 Spotlight: A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval »
Fan Wu · Patrick Rebeschini -
2020 Poster: The Statistical Complexity of Early-Stopped Mirror Descent »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2020 Spotlight: The Statistical Complexity of Early-Stopped Mirror Descent »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2019 Poster: Implicit Regularization for Optimal Sparse Recovery »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2019 Poster: Decentralized Cooperative Stochastic Bandits »
David MartÃnez-Rubio · Varun Kanade · Patrick Rebeschini -
2017 Poster: Accelerated consensus via Min-Sum Splitting »
Patrick Rebeschini · Sekhar C Tatikonda