Timezone: »
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.
Author Information
Jean Kaddour (University College London)
Linqing Liu (University College London)
Ricardo Silva (University College London)
Matt Kusner (University College London)
More from the Same Authors
-
2022 : Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging »
Jean Kaddour -
2022 : Pragmatic Fairness: Optimizing Policies with Outcome Disparity Control »
Limor Gultchin · Siyuan Guo · Alan Malek · Silvia Chiappa · Ricardo Silva -
2022 : Evaluating the Impact of Geometric and Statistical Skews on Out-Of-Distribution Generalization Performance »
Aengus Lynch · Jean Kaddour · Ricardo Silva -
2022 : Evaluating the Impact of Geometric and Statistical Skews on Out-Of-Distribution Generalization Performance »
Aengus Lynch · Jean Kaddour · Ricardo Silva -
2022 : Partial identification without distributional assumptions »
Kirtan Padh · Jakob Zeitler · David Watson · Matt Kusner · Ricardo Silva · Niki Kilbertus -
2023 Poster: On Efficient Training Algorithms For Transformer Language Models »
Jean Kaddour · Oscar Key · Piotr Nawrot · Pasquale Minervini · Matt Kusner -
2023 Poster: Intervention Generalization: A View from Factor Graph Models »
Gecia Bravo-Hermsdorff · David Watson · Jialin Yu · Jakob Zeitler · Ricardo Silva -
2023 Poster: Evaluating Self-Supervised Learning for Molecular Graph Embeddings »
Hanchen Wang · Jean Kaddour · Shengchao Liu · Jian Tang · Joan Lasenby · Qi Liu -
2022 Workshop: Algorithmic Fairness through the Lens of Causality and Privacy »
Awa Dieng · Miriam Rateike · Golnoosh Farnadi · Ferdinando Fioretto · Matt Kusner · Jessica Schrouff -
2022 Poster: Local Latent Space Bayesian Optimization over Structured Inputs »
Natalie Maus · Haydn Jones · Juston Moore · Matt Kusner · John Bradshaw · Jacob Gardner -
2021 : Ricardo Silva - The Road to Causal Programming »
Ricardo Silva -
2021 Poster: Causal Effect Inference for Structured Treatments »
Jean Kaddour · Yuchen Zhu · Qi Liu · Matt Kusner · Ricardo Silva -
2020 Workshop: Machine Learning for Molecules »
José Miguel Hernández-Lobato · Matt Kusner · Brooks Paige · Marwin Segler · Jennifer Wei -
2020 : Invited Talk: On Prediction, Action and Interference »
Ricardo Silva -
2020 Poster: Probabilistic Active Meta-Learning »
Jean Kaddour · Steindor Saemundsson · Marc Deisenroth -
2020 Poster: A Class of Algorithms for General Instrumental Variable Models »
Niki Kilbertus · Matt Kusner · Ricardo Silva -
2020 Poster: Barking up the right tree: an approach to search over molecule synthesis DAGs »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2020 Spotlight: Barking up the right tree: an approach to search over molecule synthesis DAGs »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2019 Poster: A Model to Search for Synthesizable Molecules »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2018 Poster: Bayesian Semi-supervised Learning with Graph Gaussian Processes »
Yin Cheng Ng · Nicolò Colombo · Ricardo Silva -
2017 Workshop: From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making »
Ricardo Silva · Panagiotis Toulis · John Shawe-Taylor · Alexander Volfovsky · Thorsten Joachims · Lihong Li · Nathan Kallus · Adith Swaminathan -
2017 Poster: Counterfactual Fairness »
Matt Kusner · Joshua Loftus · Chris Russell · Ricardo Silva -
2017 Oral: Counterfactual Fairness »
Matt Kusner · Joshua Loftus · Chris Russell · Ricardo Silva -
2017 Poster: Tomography of the London Underground: a Scalable Model for Origin-Destination Data »
Nicolò Colombo · Ricardo Silva · Soong Moon Kang -
2017 Poster: When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness »
Chris Russell · Matt Kusner · Joshua Loftus · Ricardo Silva -
2016 Workshop: "What If?" Inference and Learning of Hypothetical and Counterfactual Interventions in Complex Systems »
Ricardo Silva · John Shawe-Taylor · Adith Swaminathan · Thorsten Joachims -
2016 Poster: Observational-Interventional Priors for Dose-Response Learning »
Ricardo Silva -
2016 Poster: Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages »
Yin Cheng Ng · Pawel M Chilinski · Ricardo Silva -
2014 Poster: Causal Inference through a Witness Protection Program »
Ricardo Silva · Robin Evans -
2013 Poster: Flexible sampling of discrete data correlations without the marginal distributions »
Alfredo Kalaitzis · Ricardo Silva -
2011 Poster: Thinning Measurement Models and Questionnaire Design »
Ricardo Silva -
2007 Poster: Hidden Common Cause Relations in Relational Learning »
Ricardo Silva · Wei Chu · Zoubin Ghahramani -
2007 Spotlight: Hidden Common Cause Relations in Relational Learning »
Ricardo Silva · Wei Chu · Zoubin Ghahramani