Timezone: »
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.
Author Information
Wesley Maddox (New York University)
Pavel Izmailov (New York University)
Timur Garipov (MIT CSAIL)
Dmitry Vetrov (Higher School of Economics, Samsung AI Center, Moscow)
Andrew Gordon Wilson (New York University)
More from the Same Authors
-
2021 : Robust Reinforcement Learning for Shifting Dynamics During Deployment »
Samuel Stanton · Rasool Fakoor · Jonas Mueller · Andrew Gordon Wilson · Alexander Smola -
2021 : Optimizing High-Dimensional Physics Simulations via Composite Bayesian Optimization »
Wesley Maddox · Qing Feng · Maximilian Balandat -
2022 Poster: HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks »
Aibek Alanov · Vadim Titov · Dmitry Vetrov -
2022 : On Representation Learning Under Class Imbalance »
Ravid Shwartz-Ziv · Micah Goldblum · Yucen Li · C. Bayan Bruss · Andrew Gordon Wilson -
2023 Poster: Star-Shaped Denoising Diffusion Probabilistic Models »
Andrey Okhotin · Dmitry Molchanov · Arkhipkin Vladimir · Grigory Bartosh · Viktor Ohanesian · Aibek Alanov · Dmitry Vetrov -
2023 Poster: Entropic Neural Optimal Transport via Diffusion Processes »
Nikita Gushchin · Alexander Kolesov · Alexander Korotin · Dmitry Vetrov · Evgeny Burnaev -
2023 Poster: To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning »
Ildus Sadrtdinov · Dmitrii Pozdeev · Dmitry Vetrov · Ekaterina Lobacheva -
2023 Oral: Entropic Neural Optimal Transport via Diffusion Processes »
Nikita Gushchin · Alexander Kolesov · Alexander Korotin · Dmitry Vetrov · Evgeny Burnaev -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks »
Aibek Alanov · Vadim Titov · Dmitry Vetrov -
2022 Spotlight: Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes »
Maxim Kodryan · Ekaterina Lobacheva · Maksim Nakhodnov · Dmitry Vetrov -
2022 Spotlight: Lightning Talks 3B-1 »
Tianying Ji · Tongda Xu · Giulia Denevi · Aibek Alanov · Martin Wistuba · Wei Zhang · Yuesong Shen · Massimiliano Pontil · Vadim Titov · Yan Wang · Yu Luo · Daniel Cremers · Yanjun Han · Arlind Kadra · Dailan He · Josif Grabocka · Zhengyuan Zhou · Fuchun Sun · Carlo Ciliberto · Dmitry Vetrov · Mingxuan Jing · Chenjian Gao · Aaron Flores · Tsachy Weissman · Han Gao · Fengxiang He · Kunzan Liu · Wenbing Huang · Hongwei Qin -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 Poster: On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification »
Sanyam Kapoor · Wesley Maddox · Pavel Izmailov · Andrew Wilson -
2022 Poster: Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes »
Maxim Kodryan · Ekaterina Lobacheva · Maksim Nakhodnov · Dmitry Vetrov -
2022 Poster: On Feature Learning in the Presence of Spurious Correlations »
Pavel Izmailov · Polina Kirichenko · Nate Gruver · Andrew Wilson -
2021 Workshop: Bayesian Deep Learning »
Yarin Gal · Yingzhen Li · Sebastian Farquhar · Christos Louizos · Eric Nalisnick · Andrew Gordon Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2021 Poster: Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces »
Kirill Struminsky · Artyom Gadetsky · Denis Rakitin · Danil Karpushkin · Dmitry Vetrov -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2021 Poster: On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay »
Ekaterina Lobacheva · Maxim Kodryan · Nadezhda Chirkova · Andrey Malinin · Dmitry Vetrov -
2021 Poster: Does Knowledge Distillation Really Work? »
Samuel Stanton · Pavel Izmailov · Polina Kirichenko · Alexander Alemi · Andrew Wilson -
2021 Poster: Dangers of Bayesian Model Averaging under Covariate Shift »
Pavel Izmailov · Patrick Nicholson · Sanae Lotfi · Andrew Wilson -
2021 Poster: Conditioning Sparse Variational Gaussian Processes for Online Decision-making »
Wesley Maddox · Samuel Stanton · Andrew Wilson -
2021 Poster: Bayesian Optimization with High-Dimensional Outputs »
Wesley Maddox · Maximilian Balandat · Andrew Wilson · Eytan Bakshy -
2020 Poster: Bayesian Deep Learning and a Probabilistic Perspective of Generalization »
Andrew Wilson · Pavel Izmailov -
2020 Poster: On Power Laws in Deep Ensembles »
Ekaterina Lobacheva · Nadezhda Chirkova · Maxim Kodryan · Dmitry Vetrov -
2020 Spotlight: On Power Laws in Deep Ensembles »
Ekaterina Lobacheva · Nadezhda Chirkova · Maxim Kodryan · Dmitry Vetrov -
2020 Poster: Learning Invariances in Neural Networks from Training Data »
Gregory Benton · Marc Finzi · Pavel Izmailov · Andrew Wilson -
2020 Poster: Why Normalizing Flows Fail to Detect Out-of-Distribution Data »
Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2019 : Coffee/Poster session 2 »
Xingyou Song · Puneet Mangla · David Salinas · Zhenxun Zhuang · Leo Feng · Shell Xu Hu · Raul Puri · Wesley Maddox · Aniruddh Raghu · Prudencio Tossou · Mingzhang Yin · Ishita Dasgupta · Kangwook Lee · Ferran Alet · Zhen Xu · Jörg Franke · James Harrison · Jonathan Warrell · Guneet Dhillon · Arber Zela · Xin Qiu · Julien Niklas Siems · Russell Mendonca · Louis Schlessinger · Jeffrey Li · Georgiana Manolache · Debojyoti Dutta · Lucas Glass · Abhishek Singh · Gregor Koehler -
2019 Poster: The Implicit Metropolis-Hastings Algorithm »
Kirill Neklyudov · Evgenii Egorov · Dmitry Vetrov -
2019 Poster: Exact Gaussian Processes on a Million Data Points »
Ke Alexander Wang · Geoff Pleiss · Jacob Gardner · Stephen Tyree · Kilian Weinberger · Andrew Gordon Wilson -
2019 Poster: Function-Space Distributions over Kernels »
Gregory Benton · Wesley Maddox · Jayson Salkey · Julio Albinati · Andrew Gordon Wilson -
2019 Poster: Importance Weighted Hierarchical Variational Inference »
Artem Sobolev · Dmitry Vetrov -
2019 Poster: A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models »
Maxim Kuznetsov · Daniil Polykovskiy · Dmitry Vetrov · Alex Zhebrak -
2018 : TBC 2 »
Dmitry Vetrov -
2018 Poster: Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs »
Timur Garipov · Pavel Izmailov · Dmitrii Podoprikhin · Dmitry Vetrov · Andrew Wilson -
2018 Spotlight: Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs »
Timur Garipov · Pavel Izmailov · Dmitrii Podoprikhin · Dmitry Vetrov · Andrew Wilson -
2017 Poster: Structured Bayesian Pruning via Log-Normal Multiplicative Noise »
Kirill Neklyudov · Dmitry Molchanov · Arsenii Ashukha · Dmitry Vetrov -
2016 Poster: PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions »
Mikhail Figurnov · Aizhan Ibraimova · Dmitry Vetrov · Pushmeet Kohli -
2015 Poster: M-Best-Diverse Labelings for Submodular Energies and Beyond »
Alexander Kirillov · Dmytro Shlezinger · Dmitry Vetrov · Carsten Rother · Bogdan Savchynskyy -
2015 Poster: Tensorizing Neural Networks »
Alexander Novikov · Dmitrii Podoprikhin · Anton Osokin · Dmitry Vetrov