NeurIPS Non Vanishing Gradients for Arbitrarily Deep Neural Networks: a Hamiltonian System Approach

Poster
in
Workshop: The Symbiosis of Deep Learning and Differential Equations

Non Vanishing Gradients for Arbitrarily Deep Neural Networks: a Hamiltonian System Approach

Clara Galimberti · Luca Furieri

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Deep Neural Networks (DNNs) training can be difficult due to vanishing or exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stems from the discretization of continuous-time Hamiltonian systems. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semi-implicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic.

Chat is not available.

Poster in Workshop: The Symbiosis of Deep Learning and Differential Equations

Non Vanishing Gradients for Arbitrarily Deep Neural Networks: a Hamiltonian System Approach

Clara Galimberti · Luca Furieri

Poster
in
Workshop: The Symbiosis of Deep Learning and Differential Equations