Timezone: »

Stochastic Gradient MCMC with Stale Gradients
Changyou Chen · Nan Ding · Chunyuan Li · Yizhe Zhang · Lawrence Carin

Mon Dec 05 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #134

Stochastic gradient MCMC (SG-MCMC) has played an important role in large-scale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is becoming increasingly popular to employ distributed systems, where stochastic gradients are computed based on some outdated parameters, yielding what are termed stale gradients. While stale gradients could be directly used in SG-MCMC, their impact on convergence properties has not been well studied. In this paper we develop theory to show that while the bias and MSE of an SG-MCMC algorithm depend on the staleness of stochastic gradients, its estimation variance (relative to the expected estimate, based on a prescribed number of samples) is independent of it. In a simple Bayesian distributed system with SG-MCMC, where stale gradients are computed asynchronously by a set of workers, our theory indicates a linear speedup on the decrease of estimation variance w.r.t. the number of workers. Experiments on synthetic data and deep neural networks validate our theory, demonstrating the effectiveness and scalability of SG-MCMC with stale gradients.

Author Information

Changyou Chen (University at Buffalo)
Nan Ding (Google)
Chunyuan Li (Duke)

Chunyuan is a PhD student at Duke University, affiliated with department of Electrical and Computer Engineering, advised by Prof. Lawrence Carin. His recent research interests focus on scalable Bayesian methods for deep learning, including generative models and reinforcement learning, with applications to computer vision and natural language processing.

Yizhe Zhang (Duke university)
Lawrence Carin (KAUST)

More from the Same Authors