Skip to yearly menu bar Skip to main content


Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method

Ho Huu Nghia Nguyen · Tan Nguyen · Huyen Vo · Stanley Osher · Thieu Vo

Hall J (level 1) #618

Keywords: [ momentum ] [ nesterov ] [ neural ordinary differential equations ]

Abstract: We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's accelerated gradient (NAG) method, and a generalization called GNesterovNODEs. Taking the advantage of the convergence rate $\mathcal{O}(1/k^{2})$ of the NAG scheme, GNesterovNODEs speed up training and inference by reducing the number of function evaluations (NFEs) needed to solve the ODEs. We also prove that the adjoint state of a GNesterovNODEs also satisfies a GNesterovNODEs, thus accelerating both forward and backward ODE solvers and allowing the model to be scaled up for large-scale tasks. We empirically corroborate the advantage of GNesterovNODEs on a wide range of practical applications, including point cloud separation, image classification, and sequence modeling. Compared to NODEs, GNesterovNODEs require a significantly smaller number of NFEs while achieving better accuracy across our experiments.

Chat is not available.