Timezone: »

Practice-Consistent Analysis of Adam-Style Methods
Zhishuai Guo · Yi Xu · Wotao Yin · Rong Jin · Tianbao Yang

In this paper, we present a simple and intuitive proof of convergence for a family of Adam-style methods (including Adam, AMSGrad, Adabound, etc.) with an increasing or large "momentum" parameter for the first-order moment, which gives an alternative yet more natural way to guarantee Adam converge in stochastic non-convex minimization. We also establish a variance diminishing result for the used stochastic gradient estimators. The analysis is based on a widely used but not fully understood stochastic estimator using moving average (SEMA), which only requires a general unbiased stochastic oracle. In particular, we analyze Adam-style methods based on the variance recursion property of SEMA for stochastic non-convex minimization.

Author Information

Zhishuai Guo (University of Iowa)
Yi Xu (Alibaba Group)
Wotao Yin (Alibaba US, DAMO Academy)
Rong Jin (Alibaba)
Tianbao Yang (The University of Iowa)

More from the Same Authors