Timezone: »
Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such "smoothing parameters" have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling.
Author Information
Hanna Wallach (Microsoft)
David Mimno (Cornell University)
Andrew McCallum (UMass Amherst)
Related Events (a corresponding poster, oral, or spotlight)
-
2009 Poster: Rethinking LDA: Why Priors Matter »
Tue. Dec 8th 03:00 -- 07:59 AM Room
More from the Same Authors
-
2021 : CSFCube - A Test Collection of Computer Science Research Articles for Faceted Query by Example »
Sheshera Mysore · Tim O'Gorman · Andrew McCallum · Hamed Zamani -
2022 : Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model »
Jacob Eisenstein · Daniel Andor · Bernd Bohnet · Michael Collins · David Mimno -
2022 Poster: Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings »
Dongxu Zhang · Michael Boratko · Cameron Musco · Andrew McCallum -
2022 Poster: Structured Energy Network As a Loss »
Jay Yoon Lee · Dhruvesh Patel · Purujit Goyal · Wenlong Zhao · Zhiyang Xu · Andrew McCallum -
2021 Panel: How Should a Machine Learning Researcher Think About AI Ethics? »
Amanda Askell · Abeba Birhane · Jesse Dodge · Casey Fiesler · Pascale N Fung · Hanna Wallach -
2021 Poster: Capacity and Bias of Learned Geometric Embeddings for Directed Graphs »
Michael Boratko · Dongxu Zhang · Nicholas Monath · Luke Vilnis · Kenneth L Clarkson · Andrew McCallum -
2020 : Panel & Closing »
Tamara Broderick · Laurent Dinh · Neil Lawrence · Kristian Lum · Hanna Wallach · Sinead Williamson -
2020 : Morning keynote »
Hanna Wallach · Rosie Campbell -
2020 Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning »
Jessica Forde · Francisco Ruiz · Melanie Fernandez Pradier · Aaron Schein · Finale Doshi-Velez · Isabel Valera · David Blei · Hanna Wallach -
2020 Poster: Improving Local Identifiability in Probabilistic Box Embeddings »
Shib Dasgupta · Michael Boratko · Dongxu Zhang · Luke Vilnis · Xiang Li · Andrew McCallum -
2019 : Coffee Break & Poster Session 2 »
Juho Lee · Yoonho Lee · Yee Whye Teh · Raymond A. Yeh · Yuan-Ting Hu · Alex Schwing · Sara Ahmadian · Alessandro Epasto · Marina Knittel · Ravi Kumar · Mohammad Mahdian · Christian Bueno · Aditya Sanghi · Pradeep Kumar Jayaraman · Ignacio Arroyo-Fernández · Andrew Hryniowski · Vinayak Mathur · Sanjay Singh · Shahrzad Haddadan · Vasco Portilheiro · Luna Zhang · Mert Yuksekgonul · Jhosimar Arias Figueroa · Deepak Maurya · Balaraman Ravindran · Frank NIELSEN · Philip Pham · Justin Payan · Andrew McCallum · Jinesh Mehta · Ke SUN -
2019 : Opening Remarks »
Manzil Zaheer · Nicholas Monath · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov · Andrew McCallum -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 : Andrew McCallum: Learning DAGs and Trees with Box Embeddings and Hyperbolic Embeddings »
Andrew McCallum -
2019 Poster: Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks »
Amirmohammad Rooshenas · Dongxu Zhang · Gopal Sharma · Andrew McCallum -
2019 Poster: Poisson-Randomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach -
2018 : Research Panel »
Sinead Williamson · Barbara Engelhardt · Tom Griffiths · Neil Lawrence · Hanna Wallach -
2018 : Panel on research process »
Zachary Lipton · Charles Sutton · Finale Doshi-Velez · Hanna Wallach · Suchi Saria · Rich Caruana · Thomas Rainforth -
2018 : Hanna Wallach - Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? »
Hanna Wallach -
2018 Poster: Compact Representation of Uncertainty in Clustering »
Craig Greenberg · Nicholas Monath · Ari Kobren · Patrick Flaherty · Andrew McGregor · Andrew McCallum -
2017 : Invited Talk: "Light Supervision of Structured Prediction Energy Networks" »
Andrew McCallum -
2017 : Poster spotlights »
Hiroshi Kuwajima · Masayuki Tanaka · Qingkai Liang · Matthieu Komorowski · Fanyu Que · Thalita F Drumond · Aniruddh Raghu · Leo Anthony Celi · Christina Göpfert · Andrew Ross · Sarah Tan · Rich Caruana · Yin Lou · Devinder Kumar · Graham Taylor · Forough Poursabzi-Sangdeh · Jennifer Wortman Vaughan · Hanna Wallach -
2017 Poster: Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples »
Haw-Shiuan Chang · Erik Learned-Miller · Andrew McCallum -
2016 Poster: Beyond Exchangeability: The Chinese Voting Process »
Moontae Lee · Seok Hyun Jin · David Mimno -
2016 Poster: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2016 Oral: Beyond Exchangeability: The Chinese Voting Process »
Moontae Lee · Seok Hyun Jin · David Mimno -
2016 Oral: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2016 Poster: Flexible Models for Microclustering with Application to Entity Resolution »
Brenda Betancourt · Giacomo Zanella · Jeffrey Miller · Hanna Wallach · Abbas Zaidi · Beka Steorts -
2015 Workshop: Bayesian Nonparametrics: The Next Generation »
Tamara Broderick · Nick Foti · Aaron Schein · Alex Tank · Hanna Wallach · Sinead Williamson -
2015 Poster: Robust Spectral Inference for Joint Stochastic Matrix Factorization »
Moontae Lee · David Bindel · David Mimno -
2014 Workshop: 4th Workshop on Automated Knowledge Base Construction (AKBC) »
Sameer Singh · Fabian M Suchanek · Sebastian Riedel · Partha Pratim Talukdar · Kevin Murphy · Christopher Ré · William Cohen · Tom Mitchell · Andrew McCallum · Jason E Weston · Ramanathan Guha · Boyan Onyshkevych · Hoifung Poon · Oren Etzioni · Ari Kobren · Arvind Neelakantan · Peter Clark -
2013 Workshop: Topic Models: Computation, Application, and Evaluation »
David Mimno · Amr Ahmed · Jordan Boyd-Graber · Ankur Moitra · Hanna Wallach · Alexander Smola · David Blei · Anima Anandkumar -
2012 Poster: Topic-Partitioned Multinetwork Embeddings »
Peter Krafft · Juston S Moore · Hanna Wallach · Bruce Desmarais -
2012 Poster: Scalable Inference of Overlapping Communities »
Prem Gopalan · David Mimno · Sean Gerrish · Michael Freedman · David Blei -
2012 Spotlight: Scalable Inference of Overlapping Communities »
Prem Gopalan · David Mimno · Sean Gerrish · Michael Freedman · David Blei -
2012 Poster: MAP Inference in Chains using Column Generation »
David Belanger · Alexandre T Passos · Sebastian Riedel · Andrew McCallum -
2011 Workshop: 2nd Workshop on Computational Social Science and the Wisdom of Crowds »
Winter Mason · Jennifer Wortman Vaughan · Hanna Wallach -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Poster: Query-Aware MCMC »
Michael Wick · Andrew McCallum -
2010 Workshop: Computational Social Science and the Wisdom of Crowds »
Jennifer Wortman Vaughan · Hanna Wallach -
2009 Workshop: Applications for Topic Models: Text and Beyond »
David Blei · Jordan Boyd-Graber · Jonathan Chang · Katherine Heller · Hanna Wallach -
2009 Poster: FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs »
Andrew McCallum · Karl Schultz · Sameer Singh -
2009 Poster: Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference »
Michael Wick · Khashayar Rohanimanesh · Sameer Singh · Andrew McCallum -
2009 Spotlight: Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference »
Michael Wick · Khashayar Rohanimanesh · Sameer Singh · Andrew McCallum