Poster
Random Function Descent
Felix Benning · Leif Döring
West Ballroom A-D #6106
Classical worst-case optimization theory neither explains the successof optimization in machine learning, nor does it help with step sizeselection. We establish a connection between Bayesian Optimization (i.e.average case optimization theory) and classical optimization using a'stochastic Taylor approximation' to rediscover gradient descent. Thisrediscovery yields a step size schedule we call Random Function Descent(RFD), which, in contrast to classical derivations, is scale invariant.Furthermore, our analysis of RFD step sizes yields a theoretical foundationfor common step size heuristics such as gradient clipping and graduallearning rate warmup. We finally propose a statistical procedure forestimating the RFD step size schedule and validate this theory with a casestudy on the MNIST dataset.
Live content is unavailable. Log in and register to view live content