Skip to yearly menu bar Skip to main content


Geometric Dirichlet Means Algorithm for topic inference

Mikhail Yurochkin · XuanLong Nguyen

Area 5+6+7+8 #128

Keywords: [ (Application) Natural Language and Text Processing ] [ (Other) Machine Learning Topics ] [ (Other) Probabilistic Models and Methods ] [ Matrix Factorization ] [ (Other) Unsupervised Learning Methods ] [ Clustering ]


We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA's likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.

Live content is unavailable. Log in and register to view live content