Timezone: »

A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias
Puja Trivedi · Danai Koutra · Jayaraman Thiagarajan
Event URL: https://openreview.net/forum?id=o8ZPUm2Aw_ »

In order to achieve strong in-distribution (ID) and out-of-distribution (OOD) generalization during transfer learning, it was recently argued that adaptation protocols should better leverage the expressivity of high-quality, pretrained models by controlling feature distortion (FD), i.e., the failure to update features orthogonal to the ID. However, in addition to OOD generalization, practical applications require that adapted models are also safe. To this end, we study the susceptibility of common adaptation protocols to simplicity bias (SB), i.e., the well-known propensity of neural networks to rely upon simple features, as this phenomenon has recently been shown to underlie several problems in safe generalization. Using a controllable, synthetic setting, we demonstrate that solely controlling FD is not sufficient to avoid SB, harming in safe generalization. Given the need to control both SB and FD for improved safety and ID/OOD generalization, we propose modifying a recently proposed protocol with goal of reducing SB. We verify the effectiveness of these modified protocols in decreasing SB on synthetic setting, and in jointly improving OOD generalization and safety on standard adaptation benchmarks.

Author Information

Puja Trivedi (University of Michigan)
Danai Koutra (U Michigan)
Jayaraman Thiagarajan (Lawrence Livermore National Labs)

More from the Same Authors