Timezone: »
We propose a new, more general approach to the design of stochastic gradient-based optimization methods for machine learning. In this new framework, optimizers assume access to a batch of gradient estimates per iteration, rather than a single estimate. This better reflects the information that is actually available in typical machine learning setups. To demonstrate the usefulness of this generalized approach, we develop Eve, an adaptation of the Adam optimizer which uses examplewise gradients to obtain more accurate second-moment estimates. We provide preliminary experiments, without hyperparameter tuning, which show that the new optimizer slightly outperforms Adam on a small scale benchmark and performs the same or worse on larger scale benchmarks. Further work is needed to refine the algorithm and tune hyperparameters.
Author Information
Julius Kunze (University College London)
James Townsend (University College London)
David Barber (University College London)
More from the Same Authors
-
2021 : Your Dataset is a Multiset and You Should Compress it Like One »
Daniel Severo · James Townsend · Ashish Khisti · Alireza Makhzani · Karen Ullrich -
2021 : Your Dataset is a Multiset and You Should Compress it Like One »
Daniel Severo · James Townsend · Ashish Khisti · Alireza Makhzani · Karen Ullrich -
2021 : Poster Session 1 (gather.town) »
Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani -
2018 Poster: Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting »
Hippolyt Ritter · Aleksandar Botev · David Barber -
2018 Poster: Modular Networks: Learning to Decompose Neural Computation »
Louis Kirsch · Julius Kunze · David Barber -
2018 Poster: Generative Neural Machine Translation »
Harshil Shah · David Barber -
2017 Poster: Thinking Fast and Slow with Deep Learning and Tree Search »
Thomas Anthony · Zheng Tian · David Barber -
2017 Poster: Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning »
Zhen He · Shaobing Gao · Liang Xiao · Daxue Liu · Hangen He · David Barber