Timezone: »
Multi-dimensional latent variable models can capture the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional latent variable model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (e.g. methods vs. applications.) Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors.
Author Information
Michael J Paul (Johns Hopkins University)
Mark Dredze (Johns Hopkins)
More from the Same Authors
-
2022 : The Importance of Temperature in Multi-Task Optimization »
David Mueller · Mark Dredze · Nicholas Andrews -
2020 : Mark Dredze: Reducing Health Disparities in the Future of Medicine »
Mark Dredze -
2009 Poster: Adaptive Regularization of Weight Vectors »
Yacov Crammer · Alex Kulesza · Mark Dredze -
2009 Spotlight: Adaptive Regularization of Weight Vectors »
Yacov Crammer · Alex Kulesza · Mark Dredze -
2008 Poster: Exact Convex Confidence-Weighted Learning »
Yacov Crammer · Mark Dredze · Fernando Pereira -
2008 Spotlight: Exact Convex Confidence-Weighted Learning »
Yacov Crammer · Mark Dredze · Fernando Pereira