We take a new look at parameter estimation for Gaussian Mixture Model (GMMs). Specifically, we advance Riemannian manifold optimization (on the manifold of positive definite matrices) as a potential replacement for Expectation Maximization (EM), which has been the de facto standard for decades. An out-of-the-box invocation of Riemannian optimization, however, fails spectacularly: it obtains the same solution as EM, but vastly slower. Building on intuition from geometric convexity, we propose a simple reformulation that has remarkable consequences: it makes Riemannian optimization not only match EM (a nontrivial result on its own, given the poor record nonlinear programming has had against EM), but also outperform it in many settings. To bring our ideas to fruition, we develop a well-tuned Riemannian LBFGS method that proves superior to known competing methods (e.g., Riemannian conjugate gradient). We hope that our results encourage a wider consideration of manifold optimization in machine learning and statistics.
Reshad Hosseini (University of Tehran)
Suvrit Sra (MIT)
Suvrit Sra is a Research Faculty at the Laboratory for Information and Decision Systems (LIDS) at Massachusetts Institute of Technology (MIT). He obtained his PhD in Computer Science from the University of Texas at Austin in 2007. Before moving to MIT, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany. He has also held visiting faculty positions at UC Berkeley (EECS) and Carnegie Mellon University (Machine Learning Department) during 2013-2014. His research is dedicated to bridging a number of mathematical areas such as metric geometry, matrix analysis, convex analysis, probability theory, and optimization with machine learning; more broadly, his work involves algorithmically grounded topics within engineering and science. He has been a co-chair for OPT2008-2015, NIPS workshops on "Optimization for Machine Learning," and has also edited a volume of the same name (MIT Press, 2011).
More from the Same Authors
2017 Poster: Elementary Symmetric Polynomials for Optimal Experimental Design »
Zelda Mariet · Suvrit Sra
2017 Poster: Polynomial time algorithms for dual volume sampling »
Chengtao Li · Stefanie Jegelka · Suvrit Sra
2016 Tutorial: Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach
2015 Poster: On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants »
Sashank J. Reddi · Ahmed Hefny · Suvrit Sra · Barnabas Poczos · Alexander Smola