Poster
 in 
Workshop: Order up! The Benefits of Higher-Order Optimization in Machine Learning
                        
                    
                    Effects of momentum scaling for SGD
Dmitry A. Pasechnyuk · Alexander Gasnikov · Martin Takac
                        Abstract:
                        
                            The paper studies the properties of stochastic gradient methods with preconditioning. We focus on momentum updated preconditioners with momentum coefficient  . Seeking to explain practical efficiency of scaled methods, we provide convergence analysis in a norm associated with preconditioner, and demonstrate that scaling allows one to get rid of gradients Lipschitz constant in convergence rates. Along the way, we emphasize important role of  , undeservedly set to constant   at the arbitrariness of various authors. Finally, we propose the explicit constructive formulas for adaptive   and step size values.
                        
                    
                    
                Chat is not available.