Timezone: »
Reducing Communication in Nonconvex Federated Learning with a Novel Single-Loop Variance Reduction Method
Kazusato Oko · Shunta Akiyama · Tomoya Murata · Taiji Suzuki
Event URL: https://openreview.net/forum?id=pYBZZzbJtE »
In Federated Learning (FL), inter-client heterogeneity causes two types of errors: (i) \emph{client drift error} which is induced by multiple local updates, (ii) \emph{client sampling error} due to partial participation of clients at each communication. While several solutions have been offered to the former one, there is still much room of improvement on the latter one.We provide a fundamental solution to this client sampling error. The key is a novel single-loop variance reduction algorithm, SLEDGE (Single-Loop mEthoD for Gradient Estimator), which does not require periodic computation of full gradient but achieves optimal gradient complexity in the nonconvex finite-sum setting. While sampling a small number of clients at each communication round, the proposed FL algorithm, FLEDGE, requires provably fewer or at least equivalent communication rounds compared to any existing method, for finding first and even second-order stationary points in the general nonconvex setting, and under the PL condition. Moreover, under less Hessian-heterogeneity between clients, the required number of communication rounds approaches to $\tilde{\Theta}(1)$.
In Federated Learning (FL), inter-client heterogeneity causes two types of errors: (i) \emph{client drift error} which is induced by multiple local updates, (ii) \emph{client sampling error} due to partial participation of clients at each communication. While several solutions have been offered to the former one, there is still much room of improvement on the latter one.We provide a fundamental solution to this client sampling error. The key is a novel single-loop variance reduction algorithm, SLEDGE (Single-Loop mEthoD for Gradient Estimator), which does not require periodic computation of full gradient but achieves optimal gradient complexity in the nonconvex finite-sum setting. While sampling a small number of clients at each communication round, the proposed FL algorithm, FLEDGE, requires provably fewer or at least equivalent communication rounds compared to any existing method, for finding first and even second-order stationary points in the general nonconvex setting, and under the PL condition. Moreover, under less Hessian-heterogeneity between clients, the required number of communication rounds approaches to $\tilde{\Theta}(1)$.
Author Information
Kazusato Oko (The University of Tokyo)
Shunta Akiyama (Tokyo University)
Tomoya Murata (NTT DATA Mathematical Systems Inc.)
Taiji Suzuki (The University of Tokyo/RIKEN-AIP)
More from the Same Authors
-
2021 Spotlight: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2022 Poster: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2023 Poster: Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond »
Taiji Suzuki · Denny Wu · Kazusato Oko · Atsushi Nitanda -
2023 Poster: Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 Poster: Gradient-Based Feature Learning under Structured Data »
Alireza Mousavi-Hosseini · Denny Wu · Taiji Suzuki · Murat Erdogdu -
2023 Poster: Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction »
Taiji Suzuki · Denny Wu · Atsushi Nitanda -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Spotlight: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime »
Naoki Nishikawa · Taiji Suzuki · Atsushi Nitanda · Denny Wu -
2022 Poster: Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization »
Yuri Kinoshita · Taiji Suzuki -
2021 Poster: Differentiable Multiple Shooting Layers »
Stefano Massaroli · Michael Poli · Sho Sonoda · Taiji Suzuki · Jinkyoo Park · Atsushi Yamashita · Hajime Asama -
2021 Poster: Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis »
Atsushi Nitanda · Denny Wu · Taiji Suzuki -
2021 Poster: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2017 Poster: Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization »
Tomoya Murata · Taiji Suzuki -
2017 Poster: Trimmed Density Ratio Estimation »
Song Liu · Akiko Takeda · Taiji Suzuki · Kenji Fukumizu