Timezone: »
Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a simple module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.
Author Information
Skander Karkar (ISIR, UMR 7222)
Ibrahim Ayed (Sorbonne Université)
Emmanuel de Bézenac (ETH Zürich)
Patrick Gallinari (Sorbonne Universite, Criteo AI Lab)
More from the Same Authors
-
2022 : Continuous PDE Dynamics Forecasting with Implicit Neural Representations »
Yuan Yin · Matthieu Kirchmeyer · Jean-Yves Franceschi · Alain Rakotomamonjy · Patrick Gallinari -
2022 : Deep Learning for Model Correction in Cardiac Electrophysiological Imaging »
Victoriya Kashtanova · Patrick Gallinari · Maxime Sermesant -
2023 Poster: Convolutional Neural Operators for robust and accurate learning of PDEs »
Bogdan Raonic · Roberto Molinaro · Tim De Ryck · Tobias Rohner · Francesca Bartolucci · Rima Alaifari · Siddhartha Mishra · Emmanuel de Bézenac -
2023 Poster: Unifying GANs and Score-Based Diffusion as Generative Particle Models »
Jean-Yves Franceschi · Mike Gartrell · Ludovic Dos Santos · Thibaut Issenhuth · Emmanuel de Bézenac · Mickael Chen · Alain Rakotomamonjy -
2023 Poster: Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning »
Francesca Bartolucci · Emmanuel de Bézenac · Bogdan Raonic · Roberto Molinaro · Siddhartha Mishra · Rima Alaifari -
2023 Poster: Operator Learning with Neural Fields: Tackling PDEs on General Geometries »
Louis Serrano · Lise Le Boudec · Armand Kassaï Koupaï · Thomas X Wang · Yuan Yin · Jean-Noël Vittaut · Patrick Gallinari -
2022 Poster: Diverse Weight Averaging for Out-of-Distribution Generalization »
Alexandre Rame · Matthieu Kirchmeyer · Thibaud Rahier · Alain Rakotomamonjy · Patrick Gallinari · Matthieu Cord -
2022 Poster: AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier–Stokes Solutions »
Florent Bonnet · Jocelyn Mazari · Paola Cinnella · Patrick Gallinari -
2021 Poster: LEADS: Learning Dynamical Systems that Generalize Across Environments »
Yuan Yin · Ibrahim Ayed · Emmanuel de Bézenac · Nicolas Baskiotis · Patrick Gallinari -
2020 Poster: Normalizing Kalman Filters for Multivariate Time Series Analysis »
Emmanuel de Bézenac · Syama Sundar Rangapuram · Konstantinos Benidis · Michael Bohlke-Schneider · Richard Kurle · Lorenzo Stella · Hilaf Hasson · Patrick Gallinari · Tim Januschowski -
2019 : Afternoon Coffee Break & Poster Session »
Heidi Komkov · Stanislav Fort · Zhaoyou Wang · Rose Yu · Ji Hwan Park · Samuel Schoenholz · Taoli Cheng · Ryan-Rhys Griffiths · Chase Shimmin · Surya Karthik Mukkavili · Philippe Schwaller · Christian Knoll · Yangzesheng Sun · Keiichi Kisamori · Gavin Graham · Gavin Portwood · Hsin-Yuan Huang · Paul Novello · Moritz Munchmeyer · Anna Jungbluth · Daniel Levine · Ibrahim Ayed · Steven Atkinson · Jan Hermann · Peter Grönquist · · Priyabrata Saha · Yannik Glaser · Lingge Li · Yutaro Iiyama · Rushil Anirudh · Maciej Koch-Janusz · Vikram Sundar · Francois Lanusse · Auralee Edelen · Jonas Köhler · Jacky H. T. Yip · jiadong guo · Xiangyang Ju · Adi Hanuka · Adrian Albert · Valentina Salvatelli · Mauro Verzetti · Javier Duarte · Eric Moreno · Emmanuel de Bézenac · Athanasios Vlontzos · Alok Singh · Thomas Klijnsma · Brad Neuberg · Paul Wright · Mustafa Mustafa · David Schmidt · Steven Farrell · Hao Sun -
2013 Poster: Robust Bloom Filters for Large MultiLabel Classification Tasks »
Moustapha M Cisse · Nicolas Usunier · Thierry Artières · Patrick Gallinari -
2012 Poster: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari -
2012 Oral: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari