Skip to yearly menu bar Skip to main content


Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics

Wei-Shou Hsu · Pascal Poupart

Area 5+6+7+8 #90

Keywords: [ (Application) Natural Language and Text Processing ] [ (Other) Bayesian Inference ] [ Online Learning ] [ (Other) Unsupervised Learning Methods ]


Latent Dirichlet Allocation (LDA) is a very popular model for topic modeling as well as many other problems with latent groups. It is both simple and effective. When the number of topics (or latent groups) is unknown, the Hierarchical Dirichlet Process (HDP) provides an elegant non-parametric extension; however, it is a complex model and it is difficult to incorporate prior knowledge since the distribution over topics is implicit. We propose two new models that extend LDA in a simple and intuitive fashion by directly expressing a distribution over the number of topics. We also propose a new online Bayesian moment matching technique to learn the parameters and the number of topics of those models based on streaming data. The approach achieves higher log-likelihood than batch and online HDP with fixed hyperparameters on several corpora.

Live content is unavailable. Log in and register to view live content