Timezone: »
More data is expected to help us generalize to a task. But real datasets can contain out-of-distribution (OOD) data; this can come in the form of heterogeneity such as intra-class variability but also in the form of temporal shifts or concept drifts. We demonstrate a counter-intuitive phenomenon for such problems: generalization error of the task can be a non-monotonic function of the number of OOD samples; a small number of OOD samples can improve generalization but if the number of OOD samples is beyond a threshold, then the generalization error can deteriorate. We also show that if we know which samples are OOD, then using a weighted objective between the target and OOD samples ensures that the generalization error decreases monotonically. We demonstrate and analyze this phenomenon using linear classifiers on synthetic datasets and medium-sized neural networks on vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS, and DomainNet, and observe the effect data augmentation, hyperparameter optimization, and pre-training have on this behavior.
Author Information
Ashwin De Silva (Johns Hopkins University)
Rahul Ramesh (University of Pennsylvania)
Carey E Priebe (Johns Hopkins University)
Pratik Chaudhari (Univ. of Pennsylvania / AWS)
Joshua T Vogelstein (The Johns Hopkins University)
More from the Same Authors
-
2021 : Model Zoo: A Growing Brain That Learns Continually »
Rahul Ramesh · Pratik Chaudhari -
2022 : From Local to Global: Spectral-Inspired Graph Neural Networks »
Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman -
2022 : A Radiogenomics-based Coordinate System to Quantify the Heterogeneity of Glioblastoma »
Fanyang Yu · Anahita Fathi Kazerooni · Pratik Chaudhari · Christos Davatzikos -
2021 : Live Q&A Session 2 with Susan Athey, Yoshua Bengio, Sujeeth Bharadwaj, Jane Wang, Joshua Vogelstein, Weiwei Yang »
Susan Athey · Yoshua Bengio · Sujeeth Bharadwaj · Jane Wang · Weiwei Yang · Joshua T Vogelstein -
2021 : General Discussion 2 - What does the OOD problem mean to you and your field? with Anima Anandkumar, Terry Sejnowski, Chris White: General Discussion 2 »
Anima Anandkumar · Terry Sejnowski · Weiwei Yang · Joshua T Vogelstein -
2021 : Extended Live Q&A Session and Learning Salon with Joshua Vogelstein »
Joshua T Vogelstein -
2021 : Live Q&A Session 1 with Yoshua Bengio, Leyla Isik, Konrad Kording, Bernhard Scholkopf, Amit Sharma, Joshua Vogelstein, Weiwei Yang »
Yoshua Bengio · Leyla Isik · Konrad Kording · Bernhard Schölkopf · Joshua T Vogelstein · Weiwei Yang -
2021 : General Discussion 1 - What is out of distribution (OOD) generalization and why is it important? with Yoshua Bengio, Leyla Isik, Max Welling »
Yoshua Bengio · Leyla Isik · Max Welling · Joshua T Vogelstein · Weiwei Yang -
2021 Workshop: Out-of-distribution generalization and adaptation in natural and artificial intelligence »
Joshua T Vogelstein · Weiwei Yang · Soledad Villar · Zenna Tavares · Johnathan Flowers · Onyema Osuagwu · Weishung Liu -
2021 : Introduction »
Weiwei Yang · Joshua T Vogelstein · Onyema Osuagwu · Soledad Villar · Johnathan Flowers · Weishung Liu · Ronan Perry · Kaleab Alemayehu Kinfu · Teresa Huang -
2021 Poster: Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alexander Smola -
2020 Workshop: Deep Learning through Information Geometry »
Pratik Chaudhari · Alexander Alemi · Varun Jog · Dhagash Mehta · Frank Nielsen · Stefano Soatto · Greg Ver Steeg -
2020 Poster: Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation »
Rasool Fakoor · Jonas Mueller · Nick Erickson · Pratik Chaudhari · Alexander Smola -
2017 Workshop: BigNeuro 2017: Analyzing brain data from nano to macroscale »
Eva Dyer · Gregory Kiar · William Gray Roncal · · Konrad P Koerding · Joshua T Vogelstein -
2016 Workshop: Brains and Bits: Neuroscience meets Machine Learning »
Alyson Fletcher · Eva Dyer · Jascha Sohl-Dickstein · Joshua T Vogelstein · Konrad Koerding · Jakob H Macke -
2015 Workshop: BigNeuro 2015: Making sense of big neural data »
Eva Dyer · Joshua T Vogelstein · Konrad Koerding · Jeremy Freeman · Andreas S. Tolias