Timezone: »
Bayesian deep learning seeks to equip deep neural networks with the ability to precisely quantify their predictive uncertainty, and has promised to make deep learning more reliable for safety-critical real-world applications. Yet, existing Bayesian deep learning methods fall short of this promise; new methods continue to be evaluated on unrealistic test beds that do not reflect the complexities of the downstream real-world tasks that would benefit most from reliable uncertainty quantification. We propose a set of real-world tasks that accurately reflect such complexities and assess the reliability of predictive models in safety-critical scenarios. Specifically, we curate two publicly available datasets of high-resolution human retina images exhibiting varying degrees of diabetic retinopathy, a medical condition that can lead to blindness, and use them to design a suite of automated diagnosis tasks that require reliable predictive uncertainty quantification. We use these tasks to benchmark well-established and state-of-the-art Bayesian deep learning methods on task-specific evaluation metrics. We provide an easy-to-use codebase for fast and easy benchmarking following reproducibility and software design principles. We provide implementations of all methods included in the benchmark as well as results computed over 100 TPU days, 20 GPU days, 400 hyperparameter configurations, and evaluation on at least 6 random seeds each.
Author Information
Neil Band (University of Oxford)
Tim G. J. Rudner (University of Oxford)
Qixuan Feng (University of Oxford)
Angelos Filos (University of Oxford)
Zachary Nado (Google Inc.)
Mike Dusenberry (Google Research, Brain team)
Ghassen Jerfel (Duke University)
Dustin Tran (Google Brain)
Yarin Gal (University of Oxford)

Yarin leads the Oxford Applied and Theoretical Machine Learning (OATML) group. He is an Associate Professor of Machine Learning at the Computer Science department, University of Oxford. He is also the Tutorial Fellow in Computer Science at Christ Church, Oxford, and a Turing Fellow at the Alan Turing Institute, the UK’s national institute for data science and artificial intelligence. Prior to his move to Oxford he was a Research Fellow in Computer Science at St Catharine’s College at the University of Cambridge. He obtained his PhD from the Cambridge machine learning group, working with Prof Zoubin Ghahramani and funded by the Google Europe Doctoral Fellowship. He made substantial contributions to early work in modern Bayesian deep learning—quantifying uncertainty in deep learning—and developed ML/AI tools that can inform their users when the tools are “guessing at random”. These tools have been deployed widely in industry and academia, with the tools used in medical applications, robotics, computer vision, astronomy, in the sciences, and by NASA. Beyond his academic work, Yarin works with industry on deploying robust ML tools safely and responsibly. He co-chairs the NASA FDL AI committee, and is an advisor with Canadian medical imaging company Imagia, Japanese robotics company Preferred Networks, as well as numerous startups.
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Mon. Dec 13th 08:20 -- 08:30 PM Room
More from the Same Authors
-
2020 : Paper 40: Real2sim: Automatic Generation of Open Street Map Towns For Autonomous Driving Benchmarks »
Panagiotis Tigas · Yarin Gal -
2020 Meetup: MeetUp: Oxford, UK »
Yarin Gal -
2021 Spotlight: Speedy Performance Estimation for Neural Architecture Search »
Robin Ru · Clare Lyle · Lisa Schut · Miroslav Fil · Mark van der Wilk · Yarin Gal -
2021 : Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks »
Andrey Malinin · Neil Band · Yarin Gal · Mark Gales · Alexander Ganshin · German Chesnokov · Alexey Noskov · Andrey Ploskonosov · Liudmila Prokhorenkova · Ivan Provilkov · Vatsal Raina · Vyas Raina · Denis Roginskiy · Mariya Shmatova · Panagiotis Tigas · Boris Yangel -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : DeDUCE: Generating Counterfactual Explanations At Scale »
Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal -
2021 : PCA Subspaces Are Not Always Optimal for Bayesian Learning »
Alexandre Bense · Amir Joudaki · Tim G. J. Rudner · Vincent Fortuin -
2021 : Using Non-Linear Causal Models to Study Aerosol-Cloud Interactions in the Southeast Pacific »
Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier -
2021 : DARTS without a Validation Set: Optimizing the Marginal Likelihood »
Miroslav Fil · Robin Ru · Clare Lyle · Yarin Gal -
2021 : Using Non-Linear Causal Models to StudyAerosol-Cloud Interactions in the Southeast Pacific »
Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier -
2021 : Can Network Flatness Explain the Training Speed-Generalisation Connection? »
Albert Q. Jiang · Clare Lyle · Lisa Schut · Yarin Gal -
2021 : Decomposing Representations for Deterministic Uncertainty Estimation »
Haiwen Huang · Joost van Amersfoort · Yarin Gal -
2021 : On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty »
Joost van Amersfoort · Lewis Smith · Andrew Jesson · Oscar Key · Yarin Gal -
2021 : Contrastive Representation Learning with Trainable Augmentation Channel »
Masanori Koyama · Kentaro Minami · Takeru Miyato · Yarin Gal -
2021 : Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning »
Zachary Nado · Neil Band · Mark Collier · Josip Djolonga · Mike Dusenberry · Sebastian Farquhar · Qixuan Feng · Angelos Filos · Marton Havasi · Rodolphe Jenatton · Ghassen Jerfel · Jeremiah Liu · Zelda Mariet · Jeremy Nixon · Shreyas Padhy · Jie Ren · Tim G. J. Rudner · Yeming Wen · Florian Wenzel · Kevin Murphy · D. Sculley · Balaji Lakshminarayanan · Jasper Snoek · Yarin Gal · Dustin Tran -
2021 : Deep Classifiers with Label Noise Modeling and Distance Awareness »
Vincent Fortuin · Mark Collier · Florian Wenzel · James Allingham · Jeremiah Liu · Dustin Tran · Balaji Lakshminarayanan · Jesse Berent · Rodolphe Jenatton · Effrosyni Kokiopoulou -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2022 : Discovering Long-period Exoplanets using Deep Learning with Citizen Science Labels »
Shreshth A Malik · Nora Eisner · Chris Lintott · Yarin Gal -
2022 : A Neural Tangent Kernel Perspective on Function-Space Regularization in Neural Networks »
Zonghao Chen · Xupeng Shi · Tim G. J. Rudner · Qixuan Feng · Weizhong Zhang · Tong Zhang -
2022 : Using uncertainty-aware machine learning models to study aerosol-cloud interactions »
Maëlys Solal · Andrew Jesson · Yarin Gal · Alyson Douglas -
2022 : TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction »
Pascal Notin · Lodevicus van Niekerk · Aaron Kollasch · Daniel Ritter · Yarin Gal · Debora Marks -
2022 : Reliability benchmarks for image segmentation »
Estefany Kelly Buchanan · Michael Dusenberry · Jie Ren · Kevin Murphy · Balaji Lakshminarayanan · Dustin Tran -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : What 'Out-of-distribution' Is and Is Not »
Sebastian Farquhar · Yarin Gal -
2022 : Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation »
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2023 Poster: ProteinNPT: Improving protein property prediction and design with non-parametric transformers »
Pascal Notin · Ruben Weitzman · Debora Marks · Yarin Gal -
2023 Poster: ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design »
Pascal Notin · Aaron Kollasch · Daniel Ritter · Lodevicus van Niekerk · Nathan Rollins · Steffanie Paul · Ada Shaw · Ruben Weitzman · Jonathan Frazer · Mafalda Dias · Dinko Franceschi · Rose Orenbuch · Han Spinner · Yarin Gal · Debora Marks -
2022 : Benchmarking Trainng Algorithms by Zachary Nado »
Zachary Nado -
2022 Workshop: Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training »
Frank Schneider · Zachary Nado · Philipp Hennig · George Dahl · Naman Agarwal -
2022 Poster: Tractable Function-Space Variational Inference in Bayesian Neural Networks »
Tim G. J. Rudner · Zonghao Chen · Yee Whye Teh · Yarin Gal -
2022 Poster: Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions »
Andrew Jesson · Alyson Douglas · Peter Manshausen · Maëlys Solal · Nicolai Meinshausen · Philip Stier · Yarin Gal · Uri Shalit -
2022 Poster: Interventions, Where and How? Experimental Design for Causal Models at Scale »
Panagiotis Tigas · Yashas Annadani · Andrew Jesson · Bernhard Schölkopf · Yarin Gal · Stefan Bauer -
2022 Poster: Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Thomas Rainforth -
2021 : Human-in-the-loop Bayesian Deep Learning »
Yarin Gal -
2021 : [S7] DeDUCE: Generating Counterfactual Explanations At Scale »
Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal -
2021 Workshop: Bayesian Deep Learning »
Yarin Gal · Yingzhen Li · Sebastian Farquhar · Christos Louizos · Eric Nalisnick · Andrew Gordon Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2021 Poster: Speedy Performance Estimation for Neural Architecture Search »
Robin Ru · Clare Lyle · Lisa Schut · Miroslav Fil · Mark van der Wilk · Yarin Gal -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2021 Poster: Outcome-Driven Reinforcement Learning via Variational Inference »
Tim G. J. Rudner · Vitchyr Pong · Rowan McAllister · Yarin Gal · Sergey Levine -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2021 Poster: Improving black-box optimization in VAE latent space using decoder uncertainty »
Pascal Notin · José Miguel Hernández-Lobato · Yarin Gal -
2021 Poster: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations »
Tim G. J. Rudner · Cong Lu · Michael A Osborne · Yarin Gal · Yee Teh -
2021 : Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift + Q&A »
Andrey Malinin · Neil Band · German Chesnokov · Yarin Gal · Alexander Ganshin · Mark Gales · Alexey Noskov · Liudmila Prokhorenkova · Mariya Shmatova · Vyas Raina · Vatsal Raina · Panagiotis Tigas · Boris Yangel -
2021 Poster: Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data »
Andrew Jesson · Panagiotis Tigas · Joost van Amersfoort · Andreas Kirsch · Uri Shalit · Yarin Gal -
2021 Poster: Domain Invariant Representation Learning with Domain Density Transformations »
A. Tuan Nguyen · Toan Tran · Yarin Gal · Atilim Gunes Baydin -
2021 Poster: Self-Consistent Models and Values »
Greg Farquhar · Kate Baumli · Zita Marinho · Angelos Filos · Matteo Hessel · Hado van Hasselt · David Silver -
2021 Poster: Revisiting the Calibration of Modern Neural Networks »
Matthias Minderer · Josip Djolonga · Rob Romijnders · Frances Hubis · Xiaohua Zhai · Neil Houlsby · Dustin Tran · Mario Lucic -
2021 Poster: Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning »
Jannik Kossen · Neil Band · Clare Lyle · Aidan Gomez · Thomas Rainforth · Yarin Gal -
2020 Workshop: Talking to Strangers: Zero-Shot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì -
2020 Poster: Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations »
Sebastian Farquhar · Lewis Smith · Yarin Gal -
2020 Poster: A Bayesian Perspective on Training Speed and Model Selection »
Clare Lyle · Lisa Schut · Robin Ru · Yarin Gal · Mark van der Wilk -
2020 Poster: Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models »
Andrew Jesson · Sören Mindermann · Uri Shalit · Yarin Gal -
2020 Poster: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? »
Mrinank Sharma · Sören Mindermann · Jan Brauner · Gavin Leech · Anna Stephenson · Tomáš Gavenčiak · Jan Kulveit · Yee Whye Teh · Leonid Chindelevitch · Yarin Gal -
2020 Spotlight: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? »
Mrinank Sharma · Sören Mindermann · Jan Brauner · Gavin Leech · Anna Stephenson · Tomáš Gavenčiak · Jan Kulveit · Yee Whye Teh · Leonid Chindelevitch · Yarin Gal -
2019 : Poster session »
Sebastian Farquhar · Erik Daxberger · Andreas Look · Matt Benatan · Ruiyi Zhang · Marton Havasi · Fredrik Gustafsson · James A Brofos · Nabeel Seedat · Micha Livne · Ivan Ustyuzhaninov · Adam Cobb · Felix D McGregor · Patrick McClure · Tim R. Davidson · Gaurush Hiranandani · Sanjeev Arora · Masha Itkina · Didrik Nielsen · William Harvey · Matias Valdenegro-Toro · Stefano Peluchetti · Riccardo Moriconi · Tianyu Cui · Vaclav Smidl · Taylan Cemgil · Jack Fitzsimons · He Zhao · · mariana vargas vieyra · Apratim Bhattacharyya · Rahul Sharma · Geoffroy Dubourg-Felonneau · Jonathan Warrell · Slava Voloshynovskiy · Mihaela Rosca · Jiaming Song · Andrew Ross · Homa Fashandi · Ruiqi Gao · Hooshmand Shokri Razaghi · Joshua Chang · Zhenzhong Xiao · Vanessa Boehm · Giorgio Giannone · Ranganath Krishnan · Joe Davison · Arsenii Ashukha · Jeremiah Liu · Sicong (Sheldon) Huang · Evgenii Nikishin · Sunho Park · Nilesh Ahuja · Mahesh Subedar · · Artyom Gadetsky · Jhosimar Arias Figueroa · Tim G. J. Rudner · Waseem Aslam · Adrián Csiszárik · John Moberg · Ali Hebbal · Kathrin Grosse · Pekka Marttinen · Bang An · Hlynur Jónsson · Samuel Kessler · Abhishek Kumar · Mikhail Figurnov · Omesh Tickoo · Steindor Saemundsson · Ari Heljakka · Dániel Varga · Niklas Heim · Simone Rossi · Max Laves · Waseem Gharbieh · Nicholas Roberts · Luis Armando Pérez Rey · Matthew Willetts · Prithvijit Chakrabarty · Sumedh Ghaisas · Carl Shneider · Wray Buntine · Kamil Adamczewski · Xavier Gitiaux · Suwen Lin · Hao Fu · Gunnar Rätsch · Aidan Gomez · Erik Bodin · Dinh Phung · Lennart Svensson · Juliano Tusi Amaral Laganá Pinto · Milad Alizadeh · Jianzhun Du · Kevin Murphy · Beatrix Benkő · Shashaank Vattikuti · Jonathan Gordon · Christopher Kanan · Sontje Ihler · Darin Graham · Michael Teng · Louis Kirsch · Tomas Pevny · Taras Holotyak -
2019 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Eric Nalisnick · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2019 Poster: BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning »
Andreas Kirsch · Joost van Amersfoort · Yarin Gal -
2019 Poster: Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model »
Guodong Zhang · Lala Li · Zachary Nado · James Martens · Sushant Sachdeva · George Dahl · Chris Shallue · Roger Grosse -
2019 Poster: Bayesian Layers: A Module for Neural Network Uncertainty »
Dustin Tran · Mike Dusenberry · Mark van der Wilk · Danijar Hafner -
2019 Poster: Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift »
Jasper Snoek · Yaniv Ovadia · Emily Fertig · Balaji Lakshminarayanan · Sebastian Nowozin · D. Sculley · Joshua Dillon · Jie Ren · Zachary Nado -
2018 : TBC 15 »
Yarin Gal -
2018 : Invited Speaker #5 Yarin Gal »
Yarin Gal -
2018 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2018 : Opening Remarks »
Yarin Gal -
2018 Poster: BRUNO: A Deep Recurrent Model for Exchangeable Data »
Iryna Korshunova · Jonas Degrave · Ferenc Huszar · Yarin Gal · Arthur Gretton · Joni Dambre -
2017 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Andrew Wilson · Diederik Kingma · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 Poster: Concrete Dropout »
Yarin Gal · Jiri Hron · Alex Kendall -
2017 Poster: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? »
Alex Kendall · Yarin Gal -
2017 Spotlight: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? »
Alex Kendall · Yarin Gal -
2017 Poster: Real Time Image Saliency for Black Box Classifiers »
Piotr Dabkowski · Yarin Gal -
2016 : Panel Discussion »
Shakir Mohamed · David Blei · Ryan Adams · José Miguel Hernández-Lobato · Ian Goodfellow · Yarin Gal -
2016 Workshop: Bayesian Deep Learning »
Yarin Gal · Christos Louizos · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2016 Poster: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks »
Yarin Gal · Zoubin Ghahramani -
2015 : Variational Gaussian Process »
Dustin Tran -
2015 Workshop: Advances in Approximate Bayesian Inference »
Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei -
2015 Poster: Copula variational inference »
Dustin Tran · David Blei · Edo M Airoldi -
2014 Poster: Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models »
Yarin Gal · Mark van der Wilk · Carl Edward Rasmussen