Reliable Models via Responsiveness Verification
Abstract
Many safety failures in machine learning arise when models are used to assignpredictions to people – often in settings like lending, hiring, or content moderation –without accounting for how individuals can change their inputs under realisticconstraints and imperfect data. In this work, we introduce a formal validationprocedure for the responsiveness of predictions with respect to interventions ontheir features. Our procedure frames responsiveness as a type of sensitivity analysisin which practitioners control a set of changes by specifying constraints overinterventions and distributions over downstream effects, allowing uncertainty frombiased, truncated, or missing data to be made explicit. We describe how to estimateresponsiveness for the predictions of any model and any dataset using only black-box access, and how to use these estimates to support tasks such as falsification andfailure probability estimation. We develop algorithms that construct these estimatesby generating a uniform sample of reachable points, and demonstrate how theycan promote safety in real-world applications such as recidivism prediction, organtransplant prioritization, and content moderation.