Skip to yearly menu bar Skip to main content

Workshop: Distribution shifts: connecting methods and applications (DistShift)

Catastrophic Failures of Neural Active Learning on Heteroskedastic Distributions

Savya Khosla · Alex Lamb · Jordan Ash · Cyril Zhang · Kenji Kawaguchi


Models which can actively seek out the best quality training data hold the promise of more accurate, adaptable, and efficient machine learning. State-of-the-art techniques tend to prefer examples which are the most difficult to classify. While this works well on homogeneous datasets, we find that it can lead to catastrophic failures when performing active learning on multiple distributions which have different degrees of label noise (heteroskedasticity). Most active learning algorithms strongly prefer to draw from the distribution with more noise, even if its examples have no informative structure (such as solid color images). We find that active learning which encourages diversity and model uncertainty in the selected examples can significantly mitigate these failures. We hope these observations are immediately useful to practitioners and can lead to the construction of more realistic and challenging active learning benchmarks.

Chat is not available.