Skip to yearly menu bar Skip to main content

Workshop: Distribution shifts: connecting methods and applications (DistShift)

Calibrated Ensembles: A Simple Way to Mitigate ID-OOD Accuracy Tradeoffs

Ananya Kumar · Aditi Raghunathan · Tengyu Ma · Percy Liang


We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A ‘robust’ classifier obtained via specialized techniques like removing spurious features has better OOD but worse ID accuracy compared to a ‘standard’ classifier trained via vanilla ERM. On six distribution shift datasets, we find that simply ensembling the standard and robust models is a strong baseline---we match the ID accuracy of a standard model with only a small drop in OOD accuracy compared to the robust model. However, calibrating these models in-domain surprisingly improves the OOD accuracy of the ensemble and completely eliminates the tradeoff and we achieve the best of both ID and OOD accuracy over the original models.

Chat is not available.