Timezone: »
Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.
Author Information
Alexandre Rame (FAIR Meta AI - ISIR)
Currently research intern at FAIR Meta AI. PhD student at Sorbonne University in Paris under the supervision of Professor Matthieu Cord. Trying to make deep neural networks generalize out of distribution.
Matthieu Kirchmeyer (Sorbonne Université & Criteo)
Thibaud Rahier (Criteo AI Lab)
Ecole polytechnique graduate (Diplome d'ingenieur polytechnicien). Major: Applied Mathematics, Minors: Mathematics and Computer Science UC Berkeley graduate (M.A. in Statistics) PhD in Machine Learning (cifre) between INRIA and Schneider Electric in Grenoble, France Researcher at Criteo AI Lab in Grenoble, France
Alain Rakotomamonjy (Université de Rouen Normandie Criteo AI Lab)
Patrick Gallinari (Sorbonne Universite, Criteo AI Lab)
Matthieu Cord (Sorbonne University)
More from the Same Authors
-
2022 : Continuous PDE Dynamics Forecasting with Implicit Neural Representations »
Yuan Yin · Matthieu Kirchmeyer · Jean-Yves Franceschi · Alain Rakotomamonjy · Patrick Gallinari -
2022 : Multi-Modal 3D GAN for Urban Scenes »
Loïck Chambon · Mickael Chen · Tuan-Hung VU · Alexandre Boulch · Andrei Bursuc · Matthieu Cord · Patrick Pérez -
2022 : Pre-train, fine-tune, interpolate: a three-stage strategy for domain generalization »
Alexandre Rame · Jianyu Zhang · Leon Bottou · David Lopez-Paz -
2022 : Deep Learning for Model Correction in Cardiac Electrophysiological Imaging »
Victoriya Kashtanova · Patrick Gallinari · Maxime Sermesant -
2022 Poster: Benchopt: Reproducible, efficient and collaborative optimization benchmarks »
Thomas Moreau · Mathurin Massias · Alexandre Gramfort · Pierre Ablin · Pierre-Antoine Bannier · Benjamin Charlier · Mathieu Dagréou · Tom Dupre la Tour · Ghislain DURIF · Cassio F. Dantas · Quentin Klopfenstein · Johan Larsson · En Lai · Tanguy Lefort · Benoît Malézieux · Badr MOUFAD · Binh T. Nguyen · Alain Rakotomamonjy · Zaccharie Ramzi · Joseph Salmon · Samuel Vaiter -
2022 Poster: AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier–Stokes Solutions »
Florent Bonnet · Jocelyn Mazari · Paola Cinnella · Patrick Gallinari -
2022 Poster: SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance »
Edouard YVINEC · Arnaud Dapogny · Matthieu Cord · Kevin Bailly -
2021 Poster: LEADS: Learning Dynamical Systems that Generalize Across Environments »
Yuan Yin · Ibrahim Ayed · Emmanuel de Bézenac · Nicolas Baskiotis · Patrick Gallinari -
2021 Poster: Photonic Differential Privacy with Direct Feedback Alignment »
Ruben Ohana · Hamlet Medina · Julien Launay · Alessandro Cappelli · Iacopo Poli · Liva Ralaivola · Alain Rakotomamonjy -
2020 Poster: Online Non-Convex Optimization with Imperfect Feedback »
Amélie Héliou · Matthieu Martin · Panayotis Mertikopoulos · Thibaud Rahier -
2020 Poster: Normalizing Kalman Filters for Multivariate Time Series Analysis »
Emmanuel de Bézenac · Syama Sundar Rangapuram · Konstantinos Benidis · Michael Bohlke-Schneider · Richard Kurle · Lorenzo Stella · Hilaf Hasson · Patrick Gallinari · Tim Januschowski -
2019 Poster: Screening Sinkhorn Algorithm for Regularized Optimal Transport »
Mokhtar Z. Alaya · Maxime Berar · Gilles Gasso · Alain Rakotomamonjy -
2019 Poster: Singleshot : a scalable Tucker tensor decomposition »
Abraham Traore · Maxime Berar · Alain Rakotomamonjy -
2017 Poster: Joint distribution optimal transportation for domain adaptation »
Nicolas Courty · Rémi Flamary · Amaury Habrard · Alain Rakotomamonjy -
2013 Poster: Robust Bloom Filters for Large MultiLabel Classification Tasks »
Moustapha M Cisse · Nicolas Usunier · Thierry Artières · Patrick Gallinari -
2012 Poster: Multiple Operator-valued Kernel Learning »
Hachem Kadri · Alain Rakotomamonjy · Francis Bach · philippe preux -
2012 Poster: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari -
2012 Oral: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari -
2010 Workshop: New Directions in Multiple Kernel Learning »
Marius Kloft · Ulrich Rueckert · Cheng Soon Ong · Alain Rakotomamonjy · Soeren Sonnenburg · Francis Bach -
2009 Workshop: Temporal Segmentation: Perspectives from Statistics, Machine Learning, and Signal Processing »
Stephane Canu · Olivier Cappé · Arthur Gretton · Zaid Harchaoui · Alain Rakotomamonjy · Jean-Philippe Vert -
2008 Poster: Suppport Vector Machines with a Reject Option »
Yves Grandvalet · Joseph Keshet · Alain Rakotomamonjy · Stephane Canu