NeurIPS Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Poster
in
Workshop: OPT 2023: Optimization for Machine Learning

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Michal Derezinski

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Yet, the benefits of variance reduction for first-order methods tend to vanish in the large-batch setting, i.e., when stochastic gradients are computed from very large mini-batches to leverage parallelization of modern computing architectures.On the other hand, incorporating second-order information via Newton-type methods has proven successful in improving the performance of large-batch algorithms. In this work, we show that, in the presence of second-order information, variance reduction in the gradient can provide significant convergence acceleration even when using extremely large-batch gradient estimates. To demonstrate this, we study a finite-sum minimization algorithm we call Stochastic Variance-Reduced Newton (SVRN). We show that SVRN provably accelerates existing stochastic Newton-type methods (such as Subsampled Newton), while retaining their parallelizable large-batch operations: The number of passes over the data is reduced from

O (α \log (1 / ϵ))

$O(\alpha\log(1/\epsilon))$ to

O (\frac{\log (1 / ϵ)}{\log (n)})

$O\big(\frac{\log(1/\epsilon)}{\log(n)}\big)$ , i.e., by a factor of

O (α \log (n))

$O(\alpha\log(n))$ , where

n

$n$ is the number of sum components and

α

$\alpha$ is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger thedata size

n

$n$ , and can be achieved with a unit step size.

Chat is not available.

Poster in Workshop: OPT 2023: Optimization for Machine Learning

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Michal Derezinski

Poster
in
Workshop: OPT 2023: Optimization for Machine Learning