NeurIPS Effects of momentum scaling for SGD

Poster
in
Workshop: Order up! The Benefits of Higher-Order Optimization in Machine Learning

Effects of momentum scaling for SGD

Dmitry A. Pasechnyuk · Alexander Gasnikov · Martin Takac

[ Abstract ]

[ Poster]

Abstract: The paper studies the properties of stochastic gradient methods with preconditioning. We focus on momentum updated preconditioners with momentum coefficient

β

. Seeking to explain practical efficiency of scaled methods, we provide convergence analysis in a norm associated with preconditioner, and demonstrate that scaling allows one to get rid of gradients Lipschitz constant in convergence rates. Along the way, we emphasize important role of

β

, undeservedly set to constant

0.99 . . .9

at the arbitrariness of various authors. Finally, we propose the explicit constructive formulas for adaptive

β

and step size values.

Chat is not available.

Poster in Workshop: Order up! The Benefits of Higher-Order Optimization in Machine Learning

Effects of momentum scaling for SGD

Dmitry A. Pasechnyuk · Alexander Gasnikov · Martin Takac

Poster
in
Workshop: Order up! The Benefits of Higher-Order Optimization in Machine Learning