Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Michal Derezinski

Abstract: Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Yet, the benefits of variance reduction for first-order methods tend to vanish in the large-batch setting, i.e., when stochastic gradients are computed from very large mini-batches to leverage parallelization of modern computing architectures.On the other hand, incorporating second-order information via Newton-type methods has proven successful in improving the performance of large-batch algorithms. In this work, we show that, in the presence of second-order information, variance reduction in the gradient can provide significant convergence acceleration even when using extremely large-batch gradient estimates. To demonstrate this, we study a finite-sum minimization algorithm we call Stochastic Variance-Reduced Newton (SVRN). We show that SVRN provably accelerates existing stochastic Newton-type methods (such as Subsampled Newton), while retaining their parallelizable large-batch operations: The number of passes over the data is reduced from $O(\alpha\log(1/\epsilon))$ to $O\big(\frac{\log(1/\epsilon)}{\log(n)}\big)$, i.e., by a factor of $O(\alpha\log(n))$, where $n$ is the number of sum components and $\alpha$ is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger thedata size $n$, and can be achieved with a unit step size.

Chat is not available.