Poster
in
Workshop: Bayesian Deep Learning

Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks

Neil Band ⋅ Tim G. J. Rudner ⋅ Qixuan Feng ⋅ Angelos Filos ⋅ Zachary Nado ⋅ Mike Dusenberry ⋅ Ghassen Jerfel ⋅ Dustin Tran ⋅ Yarin Gal

Project Page [ OpenReview]

Abstract

Bayesian deep learning seeks to equip deep neural networks with the ability to precisely quantify their predictive uncertainty, and has promised to make deep learning more reliable for safety-critical real-world applications. Yet, existing Bayesian deep learning methods fall short of this promise; new methods continue to be evaluated on unrealistic test beds that do not reflect the complexities of the downstream real-world tasks that would benefit most from reliable uncertainty quantification. We propose a set of real-world tasks that accurately reflect such complexities and assess the reliability of predictive models in safety-critical scenarios. Specifically, we curate two publicly available datasets of high-resolution human retina images exhibiting varying degrees of diabetic retinopathy, a medical condition that can lead to blindness, and use them to design a suite of automated diagnosis tasks that require reliable predictive uncertainty quantification. We use these tasks to benchmark well-established and state-of-the-art Bayesian deep learning methods on task-specific evaluation metrics. We provide an easy-to-use codebase for fast and easy benchmarking following reproducibility and software design principles.

Chat is not available.