Workshop: Workshop on Distribution Shifts: Connecting Methods and Applications

Reliability benchmarks for image segmentation

Estefany Kelly Buchanan · Michael Dusenberry · Jie Ren · Kevin Murphy · Balaji Lakshminarayanan · Dustin Tran


Recent work has shown the importance of reliability, where model performance is assessed under stress conditions pervasive in real-world deployment. In this work, we examine reliability tasks in the setting of semantic segmentation, a dense output problem that has typically only been evaluated using in-distribution predictive performance---for example, the mean intersection over union score on the Cityscapes validation set. To reduce the gap toward reliable deployment in the real world, we compile a benchmark involving existing (and newly constructed) distribution shifts and metrics. We evaluate current models and several baselines to determine how well segmentation models make robust predictions across multiple types of distribution shift and flag when they don’t know.

Chat is not available.