Timezone: »

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
Chong Wang · David Blei

Tue Dec 08 03:30 PM -- 03:31 PM (PST) @

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the ``topics). In the sparse topic model (STM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the STM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the STM on four real-world datasets. Compared to traditional approaches, the empirical results show that STMs give better predictive performance with simpler inferred models.

Author Information

Chong Wang (Apple)
David Blei (Columbia University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors