Timezone: »

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization
Tianyi Liu · Shiyang Li · Jianping Shi · Enlu Zhou · Tuo Zhao

Wed Dec 05 07:45 AM -- 09:45 AM (PST) @ Room 210 #67

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

Author Information

Tianyi Liu (Georgia Institute of Technolodgy)
Shiyang Li (University of California, Santa Barbara)

Ph.D. student@UCSB

Jianping Shi (Sensetime Group Limited)
Enlu Zhou (Georgia Institute of Technology)
Tuo Zhao (Georgia Tech)

More from the Same Authors