Timezone: »

 
Poster
Asynchronous Distributed Learning of Topic Models
Arthur Asuncion · Padhraic Smyth · Max Welling

Wed Dec 10 07:30 PM -- 12:00 AM (PST) @

Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors. As a stepping stone in the development of asynchronous HDP, a parallel HDP sampler is also introduced.

Author Information

Arthur Asuncion (University of California, Irvine)
Padhraic Smyth (University of California, Irvine)
Max Welling (Microsoft Research AI4Science / University of Amsterdam)

More from the Same Authors