Skip to yearly menu bar Skip to main content

Workshop: OPT 2021: Optimization for Machine Learning

Practice-Consistent Analysis of Adam-Style Methods

Zhishuai Guo · Yi Xu · Wotao Yin · Rong Jin · Tianbao Yang


In this paper, we present a simple and intuitive proof of convergence for a family of Adam-style methods (including Adam, AMSGrad, Adabound, etc.) with an increasing or large "momentum" parameter for the first-order moment, which gives an alternative yet more natural way to guarantee Adam converge in stochastic non-convex minimization. We also establish a variance diminishing result for the used stochastic gradient estimators. The analysis is based on a widely used but not fully understood stochastic estimator using moving average (SEMA), which only requires a general unbiased stochastic oracle. In particular, we analyze Adam-style methods based on the variance recursion property of SEMA for stochastic non-convex minimization.

Chat is not available.