Timezone: »
Benchmark datasets have traditionally served dual purposes: first, benchmarks offer a standard on which machine learning researchers can compare different methods, and second, benchmarks provide a model, albeit imperfect, of the real world. The incompleteness of test benchmarks (and the data upon which models are trained) hinder robustness in machine learning, enable shortcut learning, and leave models systematically prone to err on out-of-distribution and adversarially perturbed data. In an effort to clarify how to address the mismatch between test benchmarks and production data, we introduce context shift to describe semantically meaningful changes in the underlying data generation process. Moreover, we identify three methods for addressing context shift that would otherwise lead to model prediction errors: first, we describe how human intuition and expert knowledge can identify semantically meaningful features upon which models systematically fail, second, we detail how dynamic benchmarking – with its focus on capturing the data generation process – can promote generalizability through corroboration, and third, we highlight that clarifying a model's limitations can reduce unexpected errors.
Author Information
Matt Groh (MIT)
More from the Same Authors
-
2022 : Identifying the Context Shift between Test Benchmarks and Production Data »
Matt Groh -
2022 : Improving dermatology classifiers across populations using images generated by large diffusion models »
Luke Sagers · James Diao · Matt Groh · Pranav Rajpurkar · Adewole Adamson · Arjun Manrai -
2021 : Context in Automated Affect Recognition »
Matt Groh · Rosalind Picard -
2018 : Poster Session 1 (note there are numerous missing names here, all papers appear in all poster sessions) »
Akhilesh Gotmare · Kenneth Holstein · Jan Brabec · Michal Uricar · Kaleigh Clary · Cynthia Rudin · Sam Witty · Andrew Ross · Shayne O'Brien · Babak Esmaeili · Jessica Forde · Massimo Caccia · Ali Emami · Scott Jordan · Bronwyn Woods · D. Sculley · Rebekah Overdorf · Nicolas Le Roux · Peter Henderson · Brandon Yang · Tzu-Yu Liu · David Jensen · Niccolo Dalmasso · Weitang Liu · Paul Marc TRICHELAIR · Jun Ki Lee · Akanksha Atrey · Matt Groh · Yotam Hechtlinger · Emma Tosch