Timezone: »
We conduct the first large meta-analysis of overfitting due to test set reuse in the machine learning community. Our analysis is based on over one hundred machine learning competitions hosted on the Kaggle platform over the course of several years. In each competition, numerous practitioners repeatedly evaluated their progress against a holdout set that forms the basis of a public ranking available throughout the competition. Performance on a separate test set used only once determined the final ranking. By systematically comparing the public ranking with the final ranking, we assess how much participants adapted to the holdout set over the course of a competition. Our study shows, somewhat surprisingly, little evidence of substantial overfitting. These findings speak to the robustness of the holdout method across different data domains, loss functions, model classes, and human analysts.
Author Information
Becca Roelofs (UC Berkeley)
Vaishaal Shankar (UC Berkeley)
Benjamin Recht (UC Berkeley)
Sara Fridovich-Keil (UC Berkeley)
Moritz Hardt (University of California, Berkeley)
John Miller (University of California, Berkeley)
Ludwig Schmidt (UC Berkeley)
More from the Same Authors
-
2021 : Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning »
Thomas Liao · Rohan Taori · Deborah Raji · Ludwig Schmidt -
2021 : Do ImageNet Classifiers Generalize to ImageNet? »
Benjamin Recht · Becca Roelofs · Ludwig Schmidt · Vaishaal Shankar -
2021 : Evaluating Machine Accuracy on ImageNet »
Vaishaal Shankar · Becca Roelofs · Horia Mania · Benjamin Recht · Ludwig Schmidt -
2021 : Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2021 : Robust fine-tuning of zero-shot models »
Mitchell Wortsman · Gabriel Ilharco · Jong Wook Kim · Mike Li · Hanna Hajishirzi · Ali Farhadi · Hongseok Namkoong · Ludwig Schmidt -
2021 : Alternative Microfoundations for Strategic Classification »
Meena Jagadeesan · Celestine Mendler-Dünner · Moritz Hardt -
2021 : Alternative Microfoundations for Strategic Classification »
Meena Jagadeesan · Celestine Mendler-Dünner · Moritz Hardt -
2022 : Causal Inference out of Control: Identifying the Steerability of Consumption »
Gary Cheng · Moritz Hardt · Celestine Mendler-Dünner -
2022 : Causal Inference out of Control: Identifying the Steerability of Consumption »
Gary Cheng · Moritz Hardt · Celestine Mendler-Dünner -
2022 Poster: When does dough become a bagel? Analyzing the remaining mistakes on ImageNet »
Vijay Vasudevan · Benjamin Caine · Raphael Gontijo Lopes · Sara Fridovich-Keil · Rebecca Roelofs -
2022 Poster: Models Out of Line: A Fourier Lens on Distribution Shift Robustness »
Sara Fridovich-Keil · Brian Bartoldson · James Diffenderfer · Bhavya Kailkhura · Timo Bremer -
2022 Poster: Spectral Bias in Practice: The Role of Function Frequency in Generalization »
Sara Fridovich-Keil · Raphael Gontijo Lopes · Rebecca Roelofs -
2021 : Microfoundations of Algorithmic decisions »
Moritz Hardt -
2021 Oral: Retiring Adult: New Datasets for Fair Machine Learning »
Frances Ding · Moritz Hardt · John Miller · Ludwig Schmidt -
2021 Poster: Retiring Adult: New Datasets for Fair Machine Learning »
Frances Ding · Moritz Hardt · John Miller · Ludwig Schmidt -
2021 Poster: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning »
Timo Milbich · Karsten Roth · Samarth Sinha · Ludwig Schmidt · Marzyeh Ghassemi · Bjorn Ommer -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2020 : Contributed Talk 6: Do Offline Metrics Predict Online Performance in Recommender Systems? »
Karl Krauth · Sarah Dean · Wenshuo Guo · Benjamin Recht · Michael Jordan -
2020 : Invited Talk 7: Prediction Dynamics »
Moritz Hardt -
2020 Workshop: Consequential Decisions in Dynamic Environments »
Niki Kilbertus · Angela Zhou · Ashia Wilson · John Miller · Lily Hu · Lydia T. Liu · Nathan Kallus · Shira Mitchell -
2020 : Tutorial: A brief tutorial on causality and fair decision making »
Moritz Hardt -
2020 Poster: Stochastic Optimization for Performative Prediction »
Celestine Mendler-Dünner · Juan Perdomo · Tijana Zrnic · Moritz Hardt -
2020 Poster: Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains »
Matthew Tancik · Pratul Srinivasan · Ben Mildenhall · Sara Fridovich-Keil · Nithin Raghavan · Utkarsh Singhal · Ravi Ramamoorthi · Jonathan Barron · Ren Ng -
2020 Spotlight: Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains »
Matthew Tancik · Pratul Srinivasan · Ben Mildenhall · Sara Fridovich-Keil · Nithin Raghavan · Utkarsh Singhal · Ravi Ramamoorthi · Jonathan Barron · Ren Ng -
2020 Oral: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Ré · Stephen Wright · Feng Niu -
2020 Poster: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2020 Spotlight: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2019 Poster: Model Similarity Mitigates Test Set Overuse »
Horia Mania · John Miller · Ludwig Schmidt · Moritz Hardt · Benjamin Recht -
2019 Poster: Unlabeled Data Improves Adversarial Robustness »
Yair Carmon · Aditi Raghunathan · Ludwig Schmidt · John Duchi · Percy Liang -
2019 Poster: Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator »
Karl Krauth · Stephen Tu · Benjamin Recht -
2019 Poster: Certainty Equivalence is Efficient for Linear Quadratic Control »
Horia Mania · Stephen Tu · Benjamin Recht -
2018 Poster: Simple random search of static linear policies is competitive for reinforcement learning »
Horia Mania · Aurelia Guy · Benjamin Recht -
2018 Poster: Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator »
Sarah Dean · Horia Mania · Nikolai Matni · Benjamin Recht · Stephen Tu -
2017 : Safety beyond Security: Societal Challenges for Machine Learning »
Moritz Hardt -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Poster: Avoiding Discrimination through Causal Reasoning »
Niki Kilbertus · Mateo Rojas Carulla · Giambattista Parascandolo · Moritz Hardt · Dominik Janzing · Bernhard Schölkopf -
2017 Poster: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Oral: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Oral: Test of Time Award »
ali rahimi · Benjamin Recht -
2017 Tutorial: Fairness in Machine Learning »
Solon Barocas · Moritz Hardt -
2016 : Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction. »
Vaishaal Shankar -
2016 Poster: The Power of Adaptivity in Identifying Statistical Alternatives »
Kevin Jamieson · Daniel Haas · Benjamin Recht -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher Ré · Benjamin Recht -
2016 Poster: Equality of Opportunity in Supervised Learning »
Moritz Hardt · Eric Price · Eric Price · Nati Srebro -
2015 Workshop: Adaptive Data Analysis »
Adam Smith · Aaron Roth · Vitaly Feldman · Moritz Hardt -
2015 Poster: Generalization in Adaptive Data Analysis and Holdout Reuse »
Cynthia Dwork · Vitaly Feldman · Moritz Hardt · Toni Pitassi · Omer Reingold · Aaron Roth -
2015 Poster: Parallel Correlation Clustering on Big Graphs »
Xinghao Pan · Dimitris Papailiopoulos · Samet Oymak · Benjamin Recht · Kannan Ramchandran · Michael Jordan -
2015 Poster: Differentially Private Learning of Structured Discrete Distributions »
Ilias Diakonikolas · Moritz Hardt · Ludwig Schmidt -
2014 Workshop: Fairness, Accountability, and Transparency in Machine Learning »
Moritz Hardt · Solon Barocas -
2014 Poster: The Noisy Power Method: A Meta Algorithm with Applications »
Moritz Hardt · Eric Price -
2014 Spotlight: The Noisy Power Method: A Meta Algorithm with Applications »
Moritz Hardt · Eric Price