Skip to yearly menu bar Skip to main content


Poster
in
Workshop: OPT 2023: Optimization for Machine Learning

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Michal Derezinski


Abstract: Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Yet, the benefits of variance reduction for first-order methods tend to vanish in the large-batch setting, i.e., when stochastic gradients are computed from very large mini-batches to leverage parallelization of modern computing architectures.On the other hand, incorporating second-order information via Newton-type methods has proven successful in improving the performance of large-batch algorithms. In this work, we show that, in the presence of second-order information, variance reduction in the gradient can provide significant convergence acceleration even when using extremely large-batch gradient estimates. To demonstrate this, we study a finite-sum minimization algorithm we call Stochastic Variance-Reduced Newton (SVRN). We show that SVRN provably accelerates existing stochastic Newton-type methods (such as Subsampled Newton), while retaining their parallelizable large-batch operations: The number of passes over the data is reduced from O(αlog(1/ϵ)) to O(log(1/ϵ)log(n)), i.e., by a factor of O(αlog(n)), where n is the number of sum components and α is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger thedata size n, and can be achieved with a unit step size.

Chat is not available.