Poster
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line
Eungyeup Kim · Mingjie Sun · Christina Baek · Aditi Raghunathan · J. Zico Kolter
East Exhibit Hall A-C #4501
Recently, Miller et al. (2021) and Baek et al. (2022) empirically demonstrated strong linear correlations between in-distribution (ID) versus out-of-distribution (OOD) accuracy and agreement. These phenomena, termed accuracy-on-the-line (ACL) and agreement-on-the-line (AGL) furthermore often exhibited the same slope and bias of the correlations, enabling OOD model selection and performance estimation without labeled data. However, the phenomena also break for certain shift, such as CIFAR10-C Gaussian Noise, posing a critical bottleneck in accurately predicting OOD performance without access to labels. In this paper, we make a key finding that recent OOD test-time adaptation methods not only improve OOD performance, but drastically strengthen the AGL and ACL phenomenon, even in shifts that initially observed very weak correlations. To analyze this, we revisit the theoretical conditions established by Miller et al. (2021), which demonstrate that ACL appears if the distributions only shift in mean and covariance scale in Gaussian data. We find that these theoretical conditions hold when deep networks are adapted to CIFAR10-C data --- models embed the initial data distribution, with complex shifts, into those only with a singular ``scaling'' variable in the feature space. Building on these stronger linear trends, we demonstrate that combining TTA and AGL-based methods can predict the OOD performance with higher precision than previous methods for a broader set of distribution shifts. Furthermore, we discover that models adapted with different hyperparameters settings exhibit the same linear trends. This allows us to perform hyperparameter selection on OOD data without relying on any labeled data.
Live content is unavailable. Log in and register to view live content