Timezone: »
Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when the feasible set is a polytope, and the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: i) large memory requirement due to the need to store an explicit convex decomposition of the current iterate, and as a consequence, large running-time overhead per iteration ii) the worst case convergence rate depends unfavorably on the dimension In this work we present a new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings. In particular, both memory and computation overheads are only linear in the dimension, and in addition, in case the optimal solution is sparse, the new convergence rate replaces a factor which is at least linear in the dimension in previous works, with a linear dependence on the number of non-zeros in the optimal solution At the heart of our method, and corresponding analysis, is a novel way to compute decomposition-invariant away-steps. While our theoretical guarantees do not apply to any polytope, they apply to several important structured polytopes that capture central concepts such as paths in graphs, perfect matchings in bipartite graphs, marginal distributions that arise in structured prediction tasks, and more. Our theoretical findings are complemented by empirical evidence that shows that our method delivers state-of-the-art performance.
Author Information
Dan Garber (Technion)
Dan Garber (Toyota Technological Institute at Chicago)
Dan's research interests lie in the intersection of machine learning and continuous optimization. Dan's main focus is on the development of efficient algorithms with novel and provable performance guarantees for basic machine learning, data analysis, decision making and optimization problems. Dan received both his Ph.D and his M.Sc degrees from the Technion - Israel Institute of Technology, where he worked under the supervision of Prof. Elad Hazan. Before that, Dan completed his bachelor's degree in computer engineering, also in the Technion.
Ofer Meshi (Google)
More from the Same Authors
-
2016 Poster: Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis »
Weiran Wang · Jialei Wang · Dan Garber · Dan Garber · Nati Srebro -
2016 Poster: Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes »
Dan Garber · Dan Garber · Ofer Meshi -
2016 Poster: Faster Projection-free Convex Optimization over the Spectrahedron »
Dan Garber · Dan Garber -
2015 Poster: Smooth and Strong: MAP Inference with Linear Convergence »
Ofer Meshi · Mehrdad Mahdavi · Alex Schwing -
2012 Poster: Convergence Rate Analysis of MAP Coordinate Minimization Algorithms »
Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2011 Poster: Approximating Semidefinite Programs in Sublinear Time »
Dan Garber · Elad Hazan -
2010 Spotlight: More data means less inference: A pseudo-max approach to structured learning »
David Sontag · Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2010 Poster: More data means less inference: A pseudo-max approach to structured learning »
David Sontag · Ofer Meshi · Tommi Jaakkola · Amir Globerson