Workshop: Distribution shifts: connecting methods and applications (DistShift)

Learning Invariant Representations with Missing Data

Mark Goldstein · Adriel Saporta · Aahlad Puli · Rajesh Ranganath · Andrew Miller


Spurious correlations allow deep models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing nuisance variable have guarantees on their test performance. However, enforcing independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.

Chat is not available.