Timezone: »

Simple, Distributed, and Accelerated Probabilistic Programming
Dustin Tran · Matthew Hoffman · Dave Moore · Christopher Suter · Srinivas Vasudevan · Alexey Radul · Matthew Johnson · Rif A. Saurous

Wed Dec 05 07:45 AM -- 09:45 AM (PST) @ Room 210 #46

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction—the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.

Author Information

Dustin Tran (Google Brain)
Matthew Hoffman (Google)
Dave Moore (Google)
Christopher Suter (Google, Inc)
Srinivas Vasudevan (Google)
Alexey Radul (Google)
Matthew Johnson (Google Brain)

Matt Johnson is a research scientist at Google Brain interested in software systems powering machine learning research. He is the tech lead for JAX, a system for composable function transformations in Python. He was a postdoc at Harvard University with Ryan Adams, working on composing graphical models with neural networks and applications in neurobiology. His Ph.D. is from MIT, where he worked with Alan Willsky on Bayesian nonparametrics, time series models, and scalable inference.

Rif A. Saurous (Google)

More from the Same Authors