We describe an approach to speed-up inference with latent variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which bounds the difference between the probabilities calculated by the algorithm and the true probabilities that the approximated model gives. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance.
Shay Cohen (Columbia University)
Michael Collins (Columbia University)
Michael Collins is the Vikram S. Pandit Professor of computer science at Columbia University. His research is focused on topics including statistical parsing, structured prediction problems in machine learning, and applications including machine translation, dialog systems, and speech recognition. His awards include a Sloan fellowship, an NSF career award, and best paper awards at EMNLP (2002, 2004, and 2010), UAI (2004 and 2005), and CoNLL 2008.
More from the Same Authors
2011 Session: Oral Session 2 »
2011 Tutorial: Lagrangian Relaxation Algorithms for Inference in Natural Language Processing »
Alexander Rush · Michael Collins
2010 Poster: Empirical Risk Minimization with Approximations of Probabilistic Grammars »
Shay Cohen · Noah A Smith
2009 Poster: Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition »
Natasha Singh-Miller · Michael Collins
2008 Poster: Unsupervised Bayesian Parameter Estimation for Probabilistic Grammars »
Shay Cohen · Kevin Gimpel · Noah A Smith