Timezone: »
The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves, over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.
Author Information
Timur Garipov (Moscow State University)
Pavel Izmailov (Cornell University)
Dmitrii Podoprikhin (XTX markets)
Dmitry Vetrov (Higher School of Economics, Samsung AI Center, Moscow)
Andrew Wilson (Cornell University)

I am a professor of machine learning at New York University.
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs »
Tue. Dec 4th through Wed the 5th Room Room 517 AB #162
More from the Same Authors
-
2021 : Robust Reinforcement Learning for Shifting Dynamics During Deployment »
Samuel Stanton · Rasool Fakoor · Jonas Mueller · Andrew Gordon Wilson · Alexander Smola -
2022 Poster: HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks »
Aibek Alanov · Vadim Titov · Dmitry Vetrov -
2022 : Transfer Learning with Deep Tabular Models »
Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum -
2022 : On Representation Learning Under Class Imbalance »
Ravid Shwartz-Ziv · Micah Goldblum · Yucen Li · C. Bayan Bruss · Andrew Gordon Wilson -
2023 Poster: Understanding the detrimental class-level effects of data augmentation »
Polina Kirichenko · Mark Ibrahim · Randall Balestriero · Diane Bouchacourt · Shanmukha Ramakrishna Vedantam · Hamed Firooz · Andrew Wilson -
2023 Poster: Star-Shaped Denoising Diffusion Probabilistic Models »
Andrey Okhotin · Dmitry Molchanov · Arkhipkin Vladimir · Grigory Bartosh · Viktor Ohanesian · Aibek Alanov · Dmitry Vetrov -
2023 Poster: Compositional Sculpting of Iterative Generative Processes »
Timur Garipov · Sebastiaan De Peuter · Ge Yang · Vikas Garg · Samuel Kaski · Tommi Jaakkola -
2023 Poster: Large Language Models Are Zero Shot Time Series Forecasters »
Marc Finzi · Nate Gruver · Shikai Qiu · Andrew Wilson -
2023 Poster: Simplifying Neural Network Training Under Class Imbalance »
Ravid Shwartz-Ziv · Micah Goldblum · Yucen Li · C. Bayan Bruss · Andrew Wilson -
2023 Poster: Entropic Neural Optimal Transport via Diffusion Processes »
Nikita Gushchin · Alexander Kolesov · Alexander Korotin · Dmitry Vetrov · Evgeny Burnaev -
2023 Poster: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra »
Andres Potapczynski · Marc Finzi · Geoff Pleiss · Andrew Wilson -
2023 Poster: To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning »
Ildus Sadrtdinov · Dmitrii Pozdeev · Dmitry Vetrov · Ekaterina Lobacheva -
2023 Poster: Protein Design with Guided Discrete Diffusion »
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson -
2023 Poster: Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution »
Tim G. J. Rudner · Ying Wang · Andrew Wilson -
2023 Poster: Should We Learn Most Likely Functions or Parameters? »
Tim G. J. Rudner · Sanyam Kapoor · Shikai Qiu · Andrew Wilson -
2023 Poster: A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning »
Valeriia Cherepanova · Gowthami Somepalli · Jonas Geiping · C. Bayan Bruss · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2023 Poster: Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks »
Micah Goldblum · Hossein Souri · Renkun Ni · Manli Shu · Viraj Prabhu · Gowthami Somepalli · Prithvijit Chattopadhyay · Adrien Bardes · Mark Ibrahim · Judy Hoffman · Rama Chellappa · Andrew Wilson · Tom Goldstein -
2023 Oral: Entropic Neural Optimal Transport via Diffusion Processes »
Nikita Gushchin · Alexander Kolesov · Alexander Korotin · Dmitry Vetrov · Evgeny Burnaev -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks »
Aibek Alanov · Vadim Titov · Dmitry Vetrov -
2022 Spotlight: Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes »
Maxim Kodryan · Ekaterina Lobacheva · Maksim Nakhodnov · Dmitry Vetrov -
2022 Spotlight: Lightning Talks 3B-1 »
Tianying Ji · Tongda Xu · Giulia Denevi · Aibek Alanov · Martin Wistuba · Wei Zhang · Yuesong Shen · Massimiliano Pontil · Vadim Titov · Yan Wang · Yu Luo · Daniel Cremers · Yanjun Han · Arlind Kadra · Dailan He · Josif Grabocka · Zhengyuan Zhou · Fuchun Sun · Carlo Ciliberto · Dmitry Vetrov · Mingxuan Jing · Chenjian Gao · Aaron Flores · Tsachy Weissman · Han Gao · Fengxiang He · Kunzan Liu · Wenbing Huang · Hongwei Qin -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Transfer Learning with Deep Tabular Models »
Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum -
2022 Poster: Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers »
Wanqian Yang · Polina Kirichenko · Micah Goldblum · Andrew Wilson -
2022 Poster: On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification »
Sanyam Kapoor · Wesley Maddox · Pavel Izmailov · Andrew Wilson -
2022 Poster: Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes »
Maxim Kodryan · Ekaterina Lobacheva · Maksim Nakhodnov · Dmitry Vetrov -
2022 Poster: On Feature Learning in the Presence of Spurious Correlations »
Pavel Izmailov · Polina Kirichenko · Nate Gruver · Andrew Wilson -
2022 Poster: PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization »
Sanae Lotfi · Marc Finzi · Sanyam Kapoor · Andres Potapczynski · Micah Goldblum · Andrew Wilson -
2021 Workshop: Bayesian Deep Learning »
Yarin Gal · Yingzhen Li · Sebastian Farquhar · Christos Louizos · Eric Nalisnick · Andrew Gordon Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2021 Poster: Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces »
Kirill Struminsky · Artyom Gadetsky · Denis Rakitin · Danil Karpushkin · Dmitry Vetrov -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2021 Poster: On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay »
Ekaterina Lobacheva · Maxim Kodryan · Nadezhda Chirkova · Andrey Malinin · Dmitry Vetrov -
2021 Poster: Residual Pathway Priors for Soft Equivariance Constraints »
Marc Finzi · Gregory Benton · Andrew Wilson -
2021 Poster: Does Knowledge Distillation Really Work? »
Samuel Stanton · Pavel Izmailov · Polina Kirichenko · Alexander Alemi · Andrew Wilson -
2021 Poster: Dangers of Bayesian Model Averaging under Covariate Shift »
Pavel Izmailov · Patrick Nicholson · Sanae Lotfi · Andrew Wilson -
2021 Poster: Conditioning Sparse Variational Gaussian Processes for Online Decision-making »
Wesley Maddox · Samuel Stanton · Andrew Wilson -
2021 Poster: Bayesian Optimization with High-Dimensional Outputs »
Wesley Maddox · Maximilian Balandat · Andrew Wilson · Eytan Bakshy -
2020 Poster: Bayesian Deep Learning and a Probabilistic Perspective of Generalization »
Andrew Wilson · Pavel Izmailov -
2020 Poster: Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints »
Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2020 Poster: On Power Laws in Deep Ensembles »
Ekaterina Lobacheva · Nadezhda Chirkova · Maxim Kodryan · Dmitry Vetrov -
2020 Spotlight: On Power Laws in Deep Ensembles »
Ekaterina Lobacheva · Nadezhda Chirkova · Maxim Kodryan · Dmitry Vetrov -
2020 Spotlight: Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints »
Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2020 Poster: BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization »
Maximilian Balandat · Brian Karrer · Daniel Jiang · Samuel Daulton · Ben Letham · Andrew Wilson · Eytan Bakshy -
2020 Poster: Learning Invariances in Neural Networks from Training Data »
Gregory Benton · Marc Finzi · Pavel Izmailov · Andrew Wilson -
2020 Poster: Improving GAN Training with Probability Ratio Clipping and Sample Reweighting »
Yue Wu · Pan Zhou · Andrew Wilson · Eric Xing · Zhiting Hu -
2020 Poster: Why Normalizing Flows Fail to Detect Out-of-Distribution Data »
Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2019 Workshop: Learning with Rich Experience: Integration of Learning Paradigms »
Zhiting Hu · Andrew Wilson · Chelsea Finn · Lisa Lee · Taylor Berg-Kirkpatrick · Ruslan Salakhutdinov · Eric Xing -
2019 Poster: The Implicit Metropolis-Hastings Algorithm »
Kirill Neklyudov · Evgenii Egorov · Dmitry Vetrov -
2019 Poster: Exact Gaussian Processes on a Million Data Points »
Ke Alexander Wang · Geoff Pleiss · Jacob Gardner · Stephen Tyree · Kilian Weinberger · Andrew Gordon Wilson -
2019 Poster: Function-Space Distributions over Kernels »
Gregory Benton · Wesley Maddox · Jayson Salkey · Julio Albinati · Andrew Gordon Wilson -
2019 Poster: Importance Weighted Hierarchical Variational Inference »
Artem Sobolev · Dmitry Vetrov -
2019 Poster: A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models »
Maxim Kuznetsov · Daniil Polykovskiy · Dmitry Vetrov · Alex Zhebrak -
2019 Poster: A Simple Baseline for Bayesian Uncertainty in Deep Learning »
Wesley Maddox · Pavel Izmailov · Timur Garipov · Dmitry Vetrov · Andrew Gordon Wilson -
2018 : TBC 2 »
Dmitry Vetrov -
2018 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2018 Poster: Scaling Gaussian Process Regression with Derivatives »
David Eriksson · Kun Dong · Eric Lee · David Bindel · Andrew Wilson -
2018 Poster: GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration »
Jacob Gardner · Geoff Pleiss · Kilian Weinberger · David Bindel · Andrew Wilson -
2018 Spotlight: GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration »
Jacob Gardner · Geoff Pleiss · Kilian Weinberger · David Bindel · Andrew Wilson -
2017 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Andrew Wilson · Diederik Kingma · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 Symposium: Interpretable Machine Learning »
Andrew Wilson · Jason Yosinski · Patrice Simard · Rich Caruana · William Herlands -
2017 Poster: Bayesian GAN »
Yunus Saatci · Andrew Wilson -
2017 Spotlight: Bayesian GANs »
Yunus Saatci · Andrew Wilson -
2017 Poster: Bayesian Optimization with Gradients »
Jian Wu · Matthias Poloczek · Andrew Wilson · Peter Frazier -
2017 Poster: Scalable Log Determinants for Gaussian Process Kernel Learning »
Kun Dong · David Eriksson · Hannes Nickisch · David Bindel · Andrew Wilson -
2017 Oral: Bayesian Optimization with Gradients »
Jian Wu · Matthias Poloczek · Andrew Wilson · Peter Frazier -
2017 Poster: Scalable Levy Process Priors for Spectral Kernel Learning »
Phillip Jang · Andrew Loeb · Matthew Davidow · Andrew Wilson -
2017 Poster: Structured Bayesian Pruning via Log-Normal Multiplicative Noise »
Kirill Neklyudov · Dmitry Molchanov · Arsenii Ashukha · Dmitry Vetrov -
2016 Workshop: Interpretable Machine Learning for Complex Systems »
Andrew Wilson · Been Kim · William Herlands -
2016 Poster: PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions »
Mikhail Figurnov · Aizhan Ibraimova · Dmitry Vetrov · Pushmeet Kohli -
2016 Poster: Stochastic Variational Deep Kernel Learning »
Andrew Wilson · Zhiting Hu · Russ Salakhutdinov · Eric Xing -
2015 Workshop: Nonparametric Methods for Large Scale Representation Learning »
Andrew G Wilson · Alexander Smola · Eric Xing -
2015 Poster: M-Best-Diverse Labelings for Submodular Energies and Beyond »
Alexander Kirillov · Dmytro Shlezinger · Dmitry Vetrov · Carsten Rother · Bogdan Savchynskyy -
2015 Poster: Tensorizing Neural Networks »
Alexander Novikov · Dmitrii Podoprikhin · Anton Osokin · Dmitry Vetrov -
2015 Poster: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2015 Spotlight: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2014 Workshop: Modern Nonparametrics 3: Automating the Learning Pipeline »
Eric Xing · Mladen Kolar · Arthur Gretton · Samory Kpotufe · Han Liu · Zoltán Szabó · Alan Yuille · Andrew G Wilson · Ryan Tibshirani · Sasha Rakhlin · Damian Kozbur · Bharath Sriperumbudur · David Lopez-Paz · Kirthevasan Kandasamy · Francesco Orabona · Andreas Damianou · Wacha Bounliphone · Yanshuai Cao · Arijit Das · Yingzhen Yang · Giulia DeSalvo · Dmitry Storcheus · Roberto Valerio -
2014 Poster: Fast Kernel Learning for Multidimensional Pattern Extrapolation »
Andrew Wilson · Elad Gilboa · John P Cunningham · Arye Nehorai -
2010 Spotlight: Copula Processes »
Andrew Wilson · Zoubin Ghahramani -
2010 Poster: Copula Processes »
Andrew Wilson · Zoubin Ghahramani