Timezone: »
Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.
Author Information
Tianyi Liu (Georgia Institute of Technolodgy)
Shiyang Li (University of California, Santa Barbara)
Ph.D. student@UCSB
Jianping Shi (Sensetime Group Limited)
Enlu Zhou (Georgia Institute of Technology)
Tuo Zhao (Georgia Tech)
More from the Same Authors
-
2022 : RLCG: When Reinforcement Learning Meets Coarse Graining »
Shenghao Wu · Tianyi Liu · Zhirui Wang · Wen Yan · Yingxiang Yang -
2023 : Machine Learning Force Fields with Data Cost Aware Training »
Alexander Bukharin · Tianyi Liu · Shengjie Wang · Simiao Zuo · Weihao Gao · Wen Yan · Tuo Zhao · Tuo Zhao -
2023 Poster: Bayesian Risk-Averse Q-Learning with Streaming Observations »
Yuhao Wang · Enlu Zhou -
2022 Poster: Bayesian Risk Markov Decision Processes »
Yifan Lin · Yuxuan Ren · Enlu Zhou -
2020 Poster: Bayesian Optimization of Risk Measures »
Sait Cakmak · Raul Astudillo · Peter Frazier · Enlu Zhou -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Towards Understanding the Importance of Shortcut Connections in Residual Networks »
Tianyi Liu · Minshuo Chen · Mo Zhou · Simon Du · Enlu Zhou · Tuo Zhao -
2019 Poster: Meta Learning with Relational Information for Short Sequences »
Yujia Xie · Haoming Jiang · Feng Liu · Tuo Zhao · Hongyuan Zha -
2019 Poster: Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds »
Minshuo Chen · Haoming Jiang · Wenjing Liao · Tuo Zhao -
2018 Poster: Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization »
Minshuo Chen · Lin Yang · Mengdi Wang · Tuo Zhao -
2018 Poster: The Physical Systems Behind Optimization Algorithms »
Lin Yang · Raman Arora · Vladimir Braverman · Tuo Zhao -
2018 Poster: Sequential Context Encoding for Duplicate Removal »
Lu Qi · Shu Liu · Jianping Shi · Jiaya Jia -
2018 Poster: FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction »
Shuyang Sun · Jiangmiao Pang · Jianping Shi · Shuai Yi · Wanli Ouyang -
2017 Poster: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2017 Spotlight: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2017 Poster: Parametric Simplex Method for Sparse Learning »
Haotian Pang · Han Liu · Robert J Vanderbei · Tuo Zhao -
2017 Poster: On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning »
Xingguo Li · Lin Yang · Jason Ge · Jarvis Haupt · Tong Zhang · Tuo Zhao