Timezone: »
Local SGD, a cornerstone algorithm in federated learning, is widely used in training deep neural networks and shown to have strong empirical performance. A theoretical understanding of such performance on nonconvex loss landscapes is currently lacking. Analysis of the global convergence of SGD is challenging, as the noise depends on the model parameters. Indeed, many works narrow their focus to GD and rely on injecting noise to enable convergence to the local or global optimum. When expanding the focus to local SGD, existing analyses in the nonconvex case can only guarantee finding stationary points or assume the neural network is overparameterized so as to guarantee convergence to the global minimum through neural tangent kernel analysis. In this work, we provide the first global convergence analysis of the vanilla local SGD for two-layer neural networks \emph{without overparameterization} and \textit{without injecting noise}, when the input data is Gaussian. The main technical ingredients of our proof are \textit{a self-correction mechanism} and \textit{a new exact recursive characterization of the direction of global model parameters}. The self-correction mechanism guarantees the algorithm reaches a good region even if the initialization is in a bad region. A good (bad) region means updating the model by gradient descent will move closer to (away from) the optimal solution. The main difficulty in establishing a self-correction mechanism is to cope with the gradient dependency between two layers. To address this challenge, we divide the landscape of the objective into several regions to carefully control the interference of two layers during the correction process. As a result, we show that local SGD can correct the two layers and enter the good region in polynomial time. After that, we establish a new exact recursive characterization of the direction of global parameters, which is the key to showing convergence to the global minimum with linear speedup in the number of machines and reduced communication rounds. Experiments on synthetic data confirm theoretical results.
Author Information
Yajie Bao (Shanghai Jiao Tong University)
Amarda Shehu (George Mason University)

Dr. Amarda Shehu is a Professor in the Department of Computer Science in the College of Engineering and Computing at George Mason University, where she is also Associate Vice President of Research for the Institute of Digital InnovAtion. Shehu obtained her Ph.D. from Rice University in 2008, where she was also an NIH predoctoral fellow in the Nanobiology Program and was dually trained in AI and Molecular Biophysics. Shehu's research is at the intersection of AI/ML and scientific enquiry across disciplines. In particular, her laboratory has made significant contributions to uncovering the relationship between macromolecular sequence, structure, dynamics, and function. Shehu has published over 160 technical papers with postdoctoral, graduate, undergraduate, and high-school students. She is a 2022 Fellow of the American Institute for Medical and Biological Engineering (AIMBE) and has received several awards, including the 2022 Outstanding Faculty Award from the State Council of Higher Education for Virginia, the 2021 Beck Family Presidential Medal for Faculty Excellence in Research and Scholarship, the 2018 Mason University Teaching Excellence Award, the 2014 Mason Emerging Researcher/Scholar/Creator Award, the 2013 Mason OSCAR Undergraduate Mentor Excellence Award, and the 2012 National Science Foundation (NSF) CAREER Award. Her research is regularly supported by various NSF programs, the Department of Defense, as well as state and private research awards. Shehu is currently the chair of the steering committee of the ACM/IEEE Journal on Transactions in Bioinformatics and Computational Biology, where she is also an associate editor. Shehu served as an NSF Program Director in the Information and Intelligent Systems Division of the Computer and Information Science and Engineering Directorate during 2019-2022. She was also an Inaugural Founding Co-Director of George Mason University’s Transdisciplinary Center for Advancing Human-Machine Partnerships.
Mingrui Liu (George Mason University)
More from the Same Authors
-
2023 Poster: Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds »
Michael Crawshaw · Yajie Bao · Mingrui Liu -
2023 Poster: Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm »
Jie Hao · Kaiyi Ji · Mingrui Liu -
2022 Spotlight: A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks »
Mingrui Liu · Zhenxun Zhuang · Yunwen Lei · Chunyang Liao -
2022 Spotlight: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2022 Poster: A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks »
Mingrui Liu · Zhenxun Zhuang · Yunwen Lei · Chunyang Liao -
2022 Poster: Robustness to Unbounded Smoothness of Generalized SignSGD »
Michael Crawshaw · Mingrui Liu · Francesco Orabona · Wei Zhang · Zhenxun Zhuang -
2022 Poster: Multi-objective Deep Data Generation with Correlated Property Control »
Shiyu Wang · Xiaojie Guo · Xuanyang Lin · Bo Pan · Yuanqi Du · Yinkai Wang · Yanfang Ye · Ashley Petersen · Austin Leitgeb · Saleh Alkhalifa · Kevin Minbiole · William M. Wuest · Amarda Shehu · Liang Zhao -
2022 Poster: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2021 Poster: Generalization Guarantee of SGD for Pairwise Learning »
Yunwen Lei · Mingrui Liu · Yiming Ying -
2020 Poster: Improved Schemes for Episodic Memory-based Lifelong Learning »
Yunhui Guo · Mingrui Liu · Tianbao Yang · Tajana S Rosing -
2020 Spotlight: Improved Schemes for Episodic Memory-based Lifelong Learning »
Yunhui Guo · Mingrui Liu · Tianbao Yang · Tajana S Rosing -
2020 Poster: A Decentralized Parallel Algorithm for Training Generative Adversarial Nets »
Mingrui Liu · Wei Zhang · Youssef Mroueh · Xiaodong Cui · Jarret Ross · Tianbao Yang · Payel Das