Timezone: »
Poster
Breaking the Span Assumption Yields Fast Finite-Sum Minimization
Robert Hannah · Yanli Liu · Daniel O'Connor · Wotao Yin
In this paper, we show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA, and SDCA without duality. Most finite sum algorithms follow what we call the ``span assumption'': Their updates are in the span of a sequence of component gradients chosen in a random IID fashion. In the big data regime, where the condition number $\kappa=O(n)$, the span assumption prevents algorithms from converging to an approximate solution of accuracy $\epsilon$ in less than $n\ln(1/\epsilon)$ iterations. SVRG and SARAH do not follow the span assumption since they are updated with a hybrid of full-gradient and component-gradient information. We show that because of this, they can be up to $\Omega(1+(\ln(n/\kappa))_+)$ times faster. In particular, to obtain an accuracy $\epsilon = 1/n^\alpha$ for $\kappa=n^\beta$ and $\alpha,\beta\in(0,1)$, modified SVRG requires $O(n)$ iterations, whereas algorithms that follow the span assumption require $O(n\ln(n))$ iterations. Moreover, we present lower bound results that show this speedup is optimal, and provide analysis to help explain why this speedup exists. With the understanding that the span assumption is a point of weakness of finite sum algorithms, future work may purposefully exploit this to yield faster algorithms in the big data regime.
Author Information
Robert Hannah (UCLA)
Yanli Liu (UCLA)
Daniel O'Connor (UCLA)
Wotao Yin (University of California, Los Angeles)
More from the Same Authors
-
2020 Poster: An Improved Analysis of Stochastic Gradient Descent with Momentum »
Yanli Liu · Yuan Gao · Wotao Yin -
2020 Poster: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods »
Yanli Liu · Kaiqing Zhang · Tamer Basar · Wotao Yin -
2018 Poster: LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning »
Tianyi Chen · Georgios Giannakis · Tao Sun · Wotao Yin -
2018 Spotlight: LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning »
Tianyi Chen · Georgios Giannakis · Tao Sun · Wotao Yin -
2018 Poster: On Markov Chain Gradient Descent »
Tao Sun · Yuejiao Sun · Wotao Yin -
2018 Poster: Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2018 Spotlight: Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2017 Poster: Straggler Mitigation in Distributed Optimization Through Data Encoding »
Can Karakus · Yifan Sun · Suhas Diggavi · Wotao Yin -
2017 Poster: Asynchronous Coordinate Descent under More Realistic Assumptions »
Tao Sun · Robert Hannah · Wotao Yin -
2017 Spotlight: Straggler Mitigation in Distributed Optimization Through Data Encoding »
Can Karakus · Yifan Sun · Suhas Diggavi · Wotao Yin