Timezone: »
Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.
Author Information
David Newman (University of California, Irvine)
Edwin Bonilla (CSIRO's Data61)
Wray Buntine
More from the Same Authors
-
2019 : Outstanding Contribution Talk: Variational Graph Convolutional Networks »
Edwin Bonilla -
2019 Poster: Structured Variational Inference in Continuous Cox Process Models »
Virginia Aglietti · Edwin Bonilla · Theodoros Damoulas · Sally Cripps -
2015 Poster: Scalable Inference for Gaussian Process Models with Black-Box Likelihoods »
Amir Dezfouli · Edwin Bonilla -
2014 Poster: Extended and Unscented Gaussian Processes »
Daniel M Steinberg · Edwin Bonilla -
2014 Spotlight: Extended and Unscented Gaussian Processes »
Daniel M Steinberg · Edwin Bonilla -
2014 Poster: Automated Variational Inference for Gaussian Process Models »
Trung V Nguyen · Edwin Bonilla -
2013 Workshop: Machine Learning for Sustainability »
Edwin Bonilla · Thomas Dietterich · Theodoros Damoulas · Andreas Krause · Daniel Sheldon · Iadine Chades · J. Zico Kolter · Bistra Dilkina · Carla Gomes · Hugo P Simao -
2010 Poster: Gaussian Process Preference Elicitation »
Edwin Bonilla · Shengbo Guo · Scott Sanner -
2007 Poster: Multi-task Gaussian Process Prediction »
Edwin Bonilla · Kian Ming A Chai · Chris Williams -
2007 Spotlight: Multi-task Gaussian Process Prediction »
Edwin Bonilla · Kian Ming A Chai · Chris Williams -
2007 Spotlight: Distributed Inference for Latent Dirichlet Allocation »
David Newman · Arthur Asuncion · Padhraic Smyth · Max Welling -
2007 Poster: Distributed Inference for Latent Dirichlet Allocation »
David Newman · Arthur Asuncion · Padhraic Smyth · Max Welling -
2006 Poster: A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation »
Yee Whye Teh · David Newman · Max Welling