Timezone: »
Poster
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Michael Sander · Pierre Ablin · Gabriel Peyré
Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to $0$ with depth $N$ if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate $\frac1N$, uniformly with depth and optimization time. As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input. We then show that Heun's method, a second order ODE integration scheme, allows for better gradient estimation with the adjoint method when the residual functions are smooth with depth. We experimentally validate that our adjoint method succeeds at large depth, and that Heun’s method needs fewer layers to succeed. We finally use the adjoint method successfully for fine-tuning very deep ResNets without memory consumption in the residual layers.
Author Information
Michael Sander (ENS Ulm, CNRS, Paris)
Pierre Ablin (Apple)
Gabriel Peyré (CNRS and ENS)
More from the Same Authors
-
2022 Poster: Benchopt: Reproducible, efficient and collaborative optimization benchmarks »
Thomas Moreau · Mathurin Massias · Alexandre Gramfort · Pierre Ablin · Pierre-Antoine Bannier · Benjamin Charlier · Mathieu Dagréou · Tom Dupre la Tour · Ghislain DURIF · Cassio F. Dantas · Quentin Klopfenstein · Johan Larsson · En Lai · Tanguy Lefort · Benoît Malézieux · Badr MOUFAD · Binh T. Nguyen · Alain Rakotomamonjy · Zaccharie Ramzi · Joseph Salmon · Samuel Vaiter -
2022 Poster: A framework for bilevel optimization that enables stochastic and global variance reduction algorithms »
Mathieu Dagréou · Pierre Ablin · Samuel Vaiter · Thomas Moreau -
2022 Poster: On global convergence of ResNets: From finite to infinite width using linear parameterization »
Raphaël Barboni · Gabriel Peyré · Francois-Xavier Vialard -
2022 Poster: Vision Transformers provably learn spatial structure »
Samy Jelassi · Michael Sander · Yuanzhi Li -
2021 Workshop: Optimal Transport and Machine Learning »
Jason Altschuler · Charlotte Bunne · Laetitia Chapel · Marco Cuturi · Rémi Flamary · Gabriel Peyré · Alexandra Suvorikova -
2021 Poster: Shared Independent Component Analysis for Multi-Subject Neuroimaging »
Hugo Richard · Pierre Ablin · Bertrand Thirion · Alexandre Gramfort · Aapo Hyvarinen -
2020 Poster: Modeling Shared responses in Neuroimaging Studies through MultiView ICA »
Hugo Richard · Luigi Gresele · Aapo Hyvarinen · Bertrand Thirion · Alexandre Gramfort · Pierre Ablin -
2020 Poster: Faster Wasserstein Distance Estimation with the Sinkhorn Divergence »
Lénaïc Chizat · Pierre Roussillon · Flavien Léger · François-Xavier Vialard · Gabriel Peyré -
2020 Poster: Online Sinkhorn: Optimal Transport distances from sample streams »
Arthur Mensch · Gabriel Peyré -
2020 Spotlight: Modeling Shared responses in Neuroimaging Studies through MultiView ICA »
Hugo Richard · Luigi Gresele · Aapo Hyvarinen · Bertrand Thirion · Alexandre Gramfort · Pierre Ablin -
2020 Oral: Online Sinkhorn: Optimal Transport distances from sample streams »
Arthur Mensch · Gabriel Peyré -
2020 Poster: Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form »
Hicham Janati · Boris Muzellec · Gabriel Peyré · Marco Cuturi -
2020 Oral: Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form »
Hicham Janati · Boris Muzellec · Gabriel Peyré · Marco Cuturi -
2019 Workshop: Optimal Transport for Machine Learning »
Marco Cuturi · Gabriel Peyré · Rémi Flamary · Alexandra Suvorikova -
2019 Poster: Learning step sizes for unfolded sparse coding »
Pierre Ablin · Thomas Moreau · Mathurin Massias · Alexandre Gramfort -
2019 Poster: Manifold-regression to predict from MEG/EEG brain signals without source modeling »
David Sabbagh · Pierre Ablin · Gael Varoquaux · Alexandre Gramfort · Denis A. Engemann -
2019 Poster: Universal Invariant and Equivariant Graph Neural Networks »
Nicolas Keriven · Gabriel Peyré