Timezone: »
We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC) interpretation of training deep networks, we show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. We further explore a low-rank representation of the second-order derivatives and show that it leads to efficient preconditioned updates with the aid of Kronecker-based factorization. The resulting method – named SNOpt – converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and time-series prediction. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies, strengthening the OC perspective as a principled tool of analyzing optimization in deep learning. Our code is available at https://github.com/ghliu/snopt.
Author Information
Guan-Horng Liu (Georgia Institute of Technology)
Tianrong Chen (Georgia Institute of Technology)
Evangelos Theodorou (Georgia Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Second-Order Neural ODE Optimizer »
Thu. Dec 9th 12:30 -- 02:00 AM Room
More from the Same Authors
-
2021 : Likelihood Training of Schrödinger Bridges using Forward-Backward SDEs Theory »
Tianrong Chen · Guan-Horng Liu · Evangelos Theodorou -
2022 : Data-driven discovery of non-Newtonian astronomy via learning non-Euclidean Hamiltonian »
Oswin So · Gongjie Li · Evangelos Theodorou · Molei Tao -
2022 Panel: Panel 6B-3: Exponential Family Model-Based… & Deep Generalized Schrödinger… »
Guan-Horng Liu · Gene Li -
2022 : Invited Talk: Guan-Horng Liu »
Guan-Horng Liu -
2022 Poster: Deep Generalized Schrödinger Bridge »
Guan-Horng Liu · Tianrong Chen · Oswin So · Evangelos Theodorou -
2020 : Contributed talks in Session 2 (Zoom) »
Martin Takac · Samuel Horváth · Guan-Horng Liu · Nicolas Loizou · Sharan Vaswani -
2020 : Contributed Video: DDPNOpt: Differential Dynamic Programming Neural Optimizer, Guan-Horng Liu »
Guan-Horng Liu -
2020 : Poster Session 1 (gather.town) »
Laurent Condat · Tiffany Vlaar · Ohad Shamir · Mohammadi Zaki · Zhize Li · Guan-Horng Liu · Samuel Horváth · Mher Safaryan · Yoni Choukroun · Kumar Shridhar · Nabil Kahale · Jikai Jin · Pratik Kumar Jawanpuria · Gaurav Kumar Yadav · Kazuki Koyama · Junyoung Kim · Xiao Li · Saugata Purkayastha · Adil Salim · Dighanchal Banerjee · Peter Richtarik · Lakshman Mahto · Tian Ye · Bamdev Mishra · Huikang Liu · Jiajie Zhu