Timezone: »
We formulate the problem of neural network optimization as Bayesian filtering, where the observations are backpropagated gradients. While neural network optimization has previously been studied using natural gradient methods which are closely related to Bayesian inference, they were unable to recover standard optimizers such as Adam and RMSprop with a root-mean-square gradient normalizer, instead getting a mean-square normalizer. To recover the root-mean-square normalizer, we find it necessary to account for the temporal dynamics of all the other parameters as they are optimized. The resulting optimizer, AdaBayes, adaptively transitions between SGD-like and Adam-like behaviour, automatically recovers AdamW, a state of the art variant of Adam with decoupled weight decay, and has generalisation performance competitive with SGD.
Author Information
Laurence Aitchison (University of Cambridge)
More from the Same Authors
-
2022 : Random initialisations performing above chance and how to find them »
Frederik Benzing · Simon Schug · Robert Meier · Johannes von Oswald · Yassir Akram · Nicolas Zucchet · Laurence Aitchison · Angelika Steger -
2021 Poster: A variational approximate posterior for the deep Wishart process »
Sebastian Ober · Laurence Aitchison -
2019 Poster: Tensor Monte Carlo: Particle Methods for the GPU era »
Laurence Aitchison -
2017 Oral: Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit »
Laurence Aitchison · Lloyd Russell · Adam Packer · Jinyao Yan · Philippe Castonguay · Michael Hausser · Srinivas C Turaga -
2017 Poster: Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit »
Laurence Aitchison · Lloyd Russell · Adam Packer · Jinyao Yan · Philippe Castonguay · Michael Hausser · Srinivas C Turaga -
2014 Poster: Fast Sampling-Based Inference in Balanced Neuronal Networks »
Guillaume Hennequin · Laurence Aitchison · Mate Lengyel