Timezone: »
Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss decreases exponentially to the global minimum in parameter space. Our expression for the natural gradient is surprisingly simple, computationally tractable, and explains why some approximations proposed previously work well in practice. This opens new avenues for approximating the natural gradient in the nonlinear case, and we show in preliminary experiments that our online natural gradient descent outperforms SGD on MNIST autoencoding while sharing its computational simplicity.
Author Information
Alberto Bernacchia (University of Cambridge)
Mate Lengyel (University of Cambridge)
Guillaume Hennequin (Cambridge)
More from the Same Authors
-
2022 Poster: Training stochastic stabilized supralinear networks by dynamics-neutral growth »
Wayne Soo · Mate Lengyel -
2021 Poster: Scalable Bayesian GPFA with automatic relevance determination and discrete noise models »
Kristopher Jensen · Ta-Chu Kao · Jasmine Stone · Guillaume Hennequin -
2021 Poster: Natural continual learning: success is a journey, not (just) a destination »
Ta-Chu Kao · Kristopher Jensen · Gido van de Ven · Alberto Bernacchia · Guillaume Hennequin -
2021 Poster: A universal probabilistic spike count model reveals ongoing modulation of neural variability »
David Liu · Mate Lengyel -
2020 Poster: Manifold GPLVMs for discovering non-Euclidean latent structure in neural data »
Kristopher Jensen · Ta-Chu Kao · Marco Tripodi · Guillaume Hennequin -
2020 Poster: Non-reversible Gaussian processes for identifying latent dynamical structure in neural data »
Virginia Rutten · Alberto Bernacchia · Maneesh Sahani · Guillaume Hennequin -
2020 Oral: Non-reversible Gaussian processes for identifying latent dynamical structure in neural data »
Virginia Rutten · Alberto Bernacchia · Maneesh Sahani · Guillaume Hennequin -
2016 Poster: Efficient state-space modularization for planning: theory, behavioral and neural signatures »
Daniel McNamee · Daniel M Wolpert · Mate Lengyel -
2014 Poster: Analog Memories in a Balanced Rate-Based Network of E-I Neurons »
Dylan Festa · Guillaume Hennequin · Mate Lengyel -
2014 Poster: A Dual Algorithm for Olfactory Computation in the Locust Brain »
Sina Tootoonian · Mate Lengyel -
2014 Oral: Analog Memories in a Balanced Rate-Based Network of E-I Neurons »
Dylan Festa · Guillaume Hennequin · Mate Lengyel -
2014 Poster: Fast Sampling-Based Inference in Balanced Neuronal Networks »
Guillaume Hennequin · Laurence Aitchison · Mate Lengyel -
2013 Poster: Correlations strike back (again): the case of associative memory retrieval »
Cristina Savin · Peter Dayan · Mate Lengyel -
2013 Oral: Correlations strike back (again): the case of associative memory retrieval »
Cristina Savin · Peter Dayan · Mate Lengyel -
2011 Session: Oral Session 11 »
Mate Lengyel -
2011 Poster: Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories »
Cristina Savin · Peter Dayan · Mate Lengyel -
2011 Poster: Active dendrites: adaptation to spike-based communication »
Balazs B Ujfalussy · Mate Lengyel -
2011 Spotlight: Active dendrites: adaptation to spike-based communication »
Balazs B Ujfalussy · Mate Lengyel -
2009 Workshop: Normative electrophysiology: Explaining cellular properties of neurons from first principles »
Jean-Pascal Pfister · Mate Lengyel -
2009 Poster: Know Thy Neighbour: A Normative Theory of Synaptic Depression »
Jean-Pascal Pfister · Peter Dayan · Mate Lengyel -
2009 Oral: Know Thy Neighbour: A Normative Theory of Synaptic Depression »
Jean-Pascal Pfister · Peter Dayan · Mate Lengyel -
2007 Oral: Hippocampal Contributions to Control: The Third Way »
Mate Lengyel · Peter Dayan -
2007 Poster: Hippocampal Contributions to Control: The Third Way »
Mate Lengyel · Peter Dayan -
2006 Poster: Uncertainty, phase and oscillatory hippocampal recall »
Mate Lengyel · Peter Dayan -
2006 Talk: Uncertainty, phase and oscillatory hippocampal recall »
Mate Lengyel · Peter Dayan