Timezone: »

Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization
Yajie Bao · Amarda Shehu · Mingrui Liu

Thu Dec 14 08:45 AM -- 10:45 AM (PST) @ Great Hall & Hall B1+B2 #905

Local SGD, a cornerstone algorithm in federated learning, is widely used in training deep neural networks and shown to have strong empirical performance. A theoretical understanding of such performance on nonconvex loss landscapes is currently lacking. Analysis of the global convergence of SGD is challenging, as the noise depends on the model parameters. Indeed, many works narrow their focus to GD and rely on injecting noise to enable convergence to the local or global optimum. When expanding the focus to local SGD, existing analyses in the nonconvex case can only guarantee finding stationary points or assume the neural network is overparameterized so as to guarantee convergence to the global minimum through neural tangent kernel analysis. In this work, we provide the first global convergence analysis of the vanilla local SGD for two-layer neural networks \emph{without overparameterization} and \textit{without injecting noise}, when the input data is Gaussian. The main technical ingredients of our proof are \textit{a self-correction mechanism} and \textit{a new exact recursive characterization of the direction of global model parameters}. The self-correction mechanism guarantees the algorithm reaches a good region even if the initialization is in a bad region. A good (bad) region means updating the model by gradient descent will move closer to (away from) the optimal solution. The main difficulty in establishing a self-correction mechanism is to cope with the gradient dependency between two layers. To address this challenge, we divide the landscape of the objective into several regions to carefully control the interference of two layers during the correction process. As a result, we show that local SGD can correct the two layers and enter the good region in polynomial time. After that, we establish a new exact recursive characterization of the direction of global parameters, which is the key to showing convergence to the global minimum with linear speedup in the number of machines and reduced communication rounds. Experiments on synthetic data confirm theoretical results.

Author Information

Yajie Bao (Shanghai Jiao Tong University)
Amarda Shehu (George Mason University)
Amarda Shehu

Dr. Amarda Shehu is a Professor in the Department of Computer Science in the College of Engineering and Computing at George Mason University, where she is also Associate Vice President of Research for the Institute of Digital InnovAtion. Shehu obtained her Ph.D. from Rice University in 2008, where she was also an NIH predoctoral fellow in the Nanobiology Program and was dually trained in AI and Molecular Biophysics. Shehu's research is at the intersection of AI/ML and scientific enquiry across disciplines. In particular, her laboratory has made significant contributions to uncovering the relationship between macromolecular sequence, structure, dynamics, and function. Shehu has published over 160 technical papers with postdoctoral, graduate, undergraduate, and high-school students. She is a 2022 Fellow of the American Institute for Medical and Biological Engineering (AIMBE) and has received several awards, including the 2022 Outstanding Faculty Award from the State Council of Higher Education for Virginia, the 2021 Beck Family Presidential Medal for Faculty Excellence in Research and Scholarship, the 2018 Mason University Teaching Excellence Award, the 2014 Mason Emerging Researcher/Scholar/Creator Award, the 2013 Mason OSCAR Undergraduate Mentor Excellence Award, and the 2012 National Science Foundation (NSF) CAREER Award. Her research is regularly supported by various NSF programs, the Department of Defense, as well as state and private research awards. Shehu is currently the chair of the steering committee of the ACM/IEEE Journal on Transactions in Bioinformatics and Computational Biology, where she is also an associate editor. Shehu served as an NSF Program Director in the Information and Intelligent Systems Division of the Computer and Information Science and Engineering Directorate during 2019-2022. She was also an Inaugural Founding Co-Director of George Mason University’s Transdisciplinary Center for Advancing Human-Machine Partnerships.

Mingrui Liu (George Mason University)

More from the Same Authors