NeurIPS 2019 Schedule

( events) Timezone:

Poster

Tue Dec 10 05:30 PM -- 07:30 PM (PST) @ East Exhibition Hall B + C #125

SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

In Optimization -- Non-Convex Optimization

Zhize Li

[ Paper] [ Poster]

We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an

(ϵ, δ)

$(\epsilon,\delta)$ -second-order stationary point with

\tilde{O} (\sqrt{n} / ϵ^{2} + \sqrt{n} / δ^{4} + n / δ^{3})

$\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an

ϵ

$\epsilon$ -first-order stationary point with

O (n + \sqrt{n} / ϵ^{2})

$O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound

Ω (\sqrt{n} / ϵ^{2})

$\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an

ϵ

$\epsilon$ -first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.