Workshop: Distribution shifts: connecting methods and applications (DistShift)

Thinking Beyond Distributions in Testing Machine Learned Models

Negar Rostamzadeh · Ben Hutchinson · Vinodkumar Prabhakaran


Testing within the machine learning (ML) community has centered around assessing a learned model's predictive performance measured against a test dataset. This test dataset is often drawn from the same distribution as the dataset used to train the model, and hence is expected to follow the same distribution as the training dataset. While recent work on robustness testing within ML community has pointed to the importance of testing against distributional shifts, these efforts also focus on estimating the likelihood of the model making an error against a reference dataset/distribution. In this paper, we argue that this view of testing actively discourages researchers and developers from looking into many other sources of robustness failures, for instance corner cases which may have severe impacts. We draw parallels with decades of work within software engineering testing focused on assessing a software system against various stress conditions, including corner cases, as opposed to solely focusing on average-case behaviour. Finally, we put forth a set of recommendations to broaden the view of machine learning testing to a rigorous practice.

Chat is not available.