Timezone: »
Unlike carefully curated academic benchmarks, real-world datasets are often highly class-imbalanced, especially in safety-critical scenarios. Through extensive empirical investigation, we study a number of foundational learning behaviors for various models such as neural networks, gradient-boosted decision trees, and SVMs under class imbalance across a range of domains. Motivated by our observation that re-balancing class-imbalanced training data is ineffective, we show that several simple techniques for improving representation learning are effective in this setting: (1) self-supervised pre-training is insensitive to imbalance and can be used for feature learning before fine-tuning on labels; (2) Bayesian inference is effective because neural networks are especially underspecified under class imbalance; (3) flatness-seeking regularization pulls decision boundaries away from minority samples, especially when we seek minima that are particularly flat on the minority samples’ loss.
Author Information
Ravid Shwartz-Ziv (Hebrew University of Jerusalem)
Micah Goldblum (University of Maryland)
Yucen (Lily) Li (New York University)
C. Bayan Bruss (Capital One)
Andrew Gordon Wilson (New York University)
More from the Same Authors
-
2020 : An Open Review of OpenReview: A Critical Analysis of the Machine Learning Conference Review Process »
David Tran · Alex Valtchanov · Keshav R Ganapathy · Raymond Feng · Eric Slud · Micah Goldblum · Tom Goldstein -
2021 : Robust Reinforcement Learning for Shifting Dynamics During Deployment »
Samuel Stanton · Rasool Fakoor · Jonas Mueller · Andrew Gordon Wilson · Alexander Smola -
2021 : A Closer Look at Distribution Shifts and Out-of-Distribution Generalization on Graphs »
Mucong Ding · Kezhi Kong · Jiuhai Chen · John Kirchenbauer · Micah Goldblum · David P Wipf · Furong Huang · Tom Goldstein -
2022 : Investigating Reproducibility from the Decision Boundary Perspective. »
Gowthami Somepalli · Arpit Bansal · Liam Fowl · Ping-yeh Chiang · Yehuda Dar · Richard Baraniuk · Micah Goldblum · Tom Goldstein -
2022 : A Deep Dive into Dataset Imbalance and Bias in Face Identification »
Valeriia Cherepanova · Steven Reich · Samuel Dooley · Hossein Souri · John Dickerson · Micah Goldblum · Tom Goldstein -
2022 : SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training »
Gowthami Somepalli · Avi Schwarzschild · Micah Goldblum · C. Bayan Bruss · Tom Goldstein -
2022 : Transfer Learning with Deep Tabular Models »
Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum -
2022 : A Deep Dive into Dataset Imbalance and Bias in Face Identification »
Valeriia Cherepanova · Steven Reich · Samuel Dooley · Hossein Souri · John Dickerson · Micah Goldblum · Tom Goldstein -
2022 : On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition »
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum -
2022 : On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition »
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum -
2022 : A Deep Dive into Dataset Imbalance and Bias in Face Identification »
Valeriia Cherepanova · Steven Reich · Samuel Dooley · Hossein Souri · John Dickerson · Micah Goldblum · Tom Goldstein -
2022 : Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries »
Yuxin Wen · Arpit Bansal · Hamid Kazemi · Eitan Borgnia · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2022 : Panning for Gold in Federated Learning: Targeted Text Extraction under Arbitrarily Large-Scale Aggregation »
Hong-Min Chu · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2022 : Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models »
Liam Fowl · Jonas Geiping · Steven Reich · Yuxin Wen · Wojciech Czaja · Micah Goldblum · Tom Goldstein -
2022 : DP-InstaHide: Data Augmentations Provably Enhance Guarantees Against Dataset Manipulations »
Eitan Borgnia · Jonas Geiping · Valeriia Cherepanova · Liam Fowl · Arjun Gupta · Amin Ghiasi · Furong Huang · Micah Goldblum · Tom Goldstein -
2023 : An Information-Theoretic Understanding of Maximum Manifold Capacity Representations »
Berivan Isik · Victor Lecomte · Rylan Schaeffer · Yann LeCun · Mikail Khona · Ravid Shwartz-Ziv · Sanmi Koyejo · Andrey Gromov -
2023 : Non-Vacuous Generalization Bounds for Large Language Models »
Sanae Lotfi · Marc Finzi · Yilun Kuang · Tim G. J. Rudner · Micah Goldblum · Andrew Wilson -
2023 : An Information-Theoretic Understanding of Maximum Manifold Capacity Representations »
Berivan Isik · Rylan Schaeffer · Victor Lecomte · Mikail Khona · Yann LeCun · Sanmi Koyejo · Andrey Gromov · Ravid Shwartz-Ziv -
2023 : A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning »
Valeriia Cherepanova · Roman Levin · Gowthami Somepalli · Jonas Geiping · C. Bayan Bruss · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2023 : Non-Vacuous Generalization Bounds for Large Language Models »
Sanae Lotfi · Marc Finzi · Yilun Kuang · Tim G. J. Rudner · Micah Goldblum · Andrew Wilson -
2023 : An Information-Theoretic Understanding of Maximum Manifold Capacity Representations »
Berivan Isik · Victor Lecomte · Rylan Schaeffer · Yann LeCun · Mikail Khona · Ravid Shwartz-Ziv · Sanmi Koyejo · Andrey Gromov -
2023 : A Simple and Efficient Baseline for Data Attribution on Images »
Vasu Singla · Pedro Sandoval-Segura · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 : An Information-Theoretic Understanding of Maximum Manifold Capacity Representations »
Rylan Schaeffer · Berivan Isik · Victor Lecomte · Mikail Khona · Yann LeCun · Andrey Gromov · Ravid Shwartz-Ziv · Sanmi Koyejo -
2023 : An Information-Theoretic Understanding of Maximum Manifold Capacity Representations »
Rylan Schaeffer · Berivan Isik · Victor Lecomte · Mikail Khona · Yann LeCun · Andrey Gromov · Ravid Shwartz-Ziv · Sanmi Koyejo -
2023 Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly »
Khoa D Doan · Aniruddha Saha · Anh Tran · Yingjie Lao · Kok-Seng Wong · Ang Li · HARIPRIYA HARIKUMAR · Eugene Bagdasaryan · Micah Goldblum · Tom Goldstein -
2023 Poster: What Can We Learn from Unlearnable Datasets? »
Pedro Sandoval-Segura · Vasu Singla · Jonas Geiping · Micah Goldblum · Tom Goldstein -
2023 Poster: When Do Neural Nets Outperform Boosted Trees on Tabular Data? »
Duncan McElfresh · Sujay Khandagale · Jonathan Valverde · Vishak Prasad C · Ganesh Ramakrishnan · Micah Goldblum · Colin White -
2023 Poster: Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks »
Micah Goldblum · Hossein Souri · Renkun Ni · Manli Shu · Viraj Prabhu · Gowthami Somepalli · Prithvijit Chattopadhyay · Mark Ibrahim · Adrien Bardes · Judy Hoffman · Rama Chellappa · Andrew Wilson · Tom Goldstein -
2023 Poster: Simplifying Neural Network Training Under Class Imbalance »
Ravid Shwartz-Ziv · Micah Goldblum · Yucen Li · C. Bayan Bruss · Andrew Wilson -
2023 Poster: Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition »
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum -
2023 Oral: Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition »
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum -
2023 Poster: Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise »
Arpit Bansal · Eitan Borgnia · Hong-Min Chu · Jie Li · Hamid Kazemi · Furong Huang · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 Poster: Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery »
Yuxin Wen · Neel Jain · John Kirchenbauer · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 Poster: An Information Theory Perspective on Variance-Invariance-Covariance Regularization »
Ravid Shwartz-Ziv · Randall Balestriero · Kenji Kawaguchi · Tim G. J. Rudner · Yann LeCun -
2023 Poster: Reverse Engineering Self-Supervised Learning »
Ido Ben-Shaul · Ravid Shwartz-Ziv · Tomer Galanti · Shai Dekel · Yann LeCun -
2023 Poster: Understanding and Mitigating Copying in Diffusion Models »
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 Poster: A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning »
Valeriia Cherepanova · Roman Levin · Gowthami Somepalli · Jonas Geiping · C. Bayan Bruss · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2022 Workshop: Graph Learning for Industrial Applications: Finance, Crime Detection, Medicine and Social Media »
Manuela Veloso · John Dickerson · Senthil Kumar · Eren K. · Jian Tang · Jie Chen · Peter Henstock · Susan Tibbs · Ani Calinescu · Naftali Cohen · C. Bayan Bruss · Armineh Nourbakhsh -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Transfer Learning with Deep Tabular Models »
Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum -
2022 Poster: Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability »
Roman Levin · Manli Shu · Eitan Borgnia · Furong Huang · Micah Goldblum · Tom Goldstein -
2022 Poster: Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers »
Wanqian Yang · Polina Kirichenko · Micah Goldblum · Andrew Wilson -
2022 Poster: Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: Autoregressive Perturbations for Data Poisoning »
Pedro Sandoval-Segura · Vasu Singla · Jonas Geiping · Micah Goldblum · Tom Goldstein · David Jacobs -
2022 Poster: Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch »
Hossein Souri · Liam Fowl · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 Poster: PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization »
Sanae Lotfi · Marc Finzi · Sanyam Kapoor · Andres Potapczynski · Micah Goldblum · Andrew Wilson -
2022 Poster: End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking »
Arpit Bansal · Avi Schwarzschild · Eitan Borgnia · Zeyad Emam · Furong Huang · Micah Goldblum · Tom Goldstein -
2021 Workshop: Bayesian Deep Learning »
Yarin Gal · Yingzhen Li · Sebastian Farquhar · Christos Louizos · Eric Nalisnick · Andrew Gordon Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2021 : A Closer Look at Distribution Shifts and Out-of-Distribution Generalization on Graphs »
Mucong Ding · Kezhi Kong · Jiuhai Chen · John Kirchenbauer · Micah Goldblum · David P Wipf · Furong Huang · Tom Goldstein -
2021 Poster: Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks »
Avi Schwarzschild · Eitan Borgnia · Arjun Gupta · Furong Huang · Uzi Vishkin · Micah Goldblum · Tom Goldstein -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2021 Poster: Adversarial Examples Make Strong Poisons »
Liam Fowl · Micah Goldblum · Ping-yeh Chiang · Jonas Geiping · Wojciech Czaja · Tom Goldstein -
2021 Poster: Encoding Robustness to Image Style via Adversarial Feature Perturbations »
Manli Shu · Zuxuan Wu · Micah Goldblum · Tom Goldstein -
2020 : Lightning Talk 4: Latent-CF: A Simple Baseline for Reverse Counterfactual Explanations »
C. Bayan Bruss · Rachana Balasubramanian · Brian Barr · Samuel Sharpe · Jason Wittenbach -
2020 : The Intrinsic Dimension of Images and Its Impact on Learning »
Chen Zhu · Micah Goldblum · Ahmed Abdelkader · Tom Goldstein · Phillip Pope -
2020 Workshop: Fair AI in Finance »
Senthil Kumar · Cynthia Rudin · John Paisley · Isabelle Moulinier · C. Bayan Bruss · Eren K. · Susan Tibbs · Oluwatobi Olabiyi · Simona Gandrabur · Svitlana Vyetrenko · Kevin Compher -
2020 Workshop: Workshop on Dataset Curation and Security »
Nathalie Baracaldo · Yonatan Bisk · Avrim Blum · Michael Curry · John Dickerson · Micah Goldblum · Tom Goldstein · Bo Li · Avi Schwarzschild -
2020 Poster: Adversarially Robust Few-Shot Learning: A Meta-Learning Approach »
Micah Goldblum · Liam Fowl · Tom Goldstein -
2019 Poster: Exact Gaussian Processes on a Million Data Points »
Ke Alexander Wang · Geoff Pleiss · Jacob Gardner · Stephen Tyree · Kilian Weinberger · Andrew Gordon Wilson -
2019 Poster: Function-Space Distributions over Kernels »
Gregory Benton · Wesley Maddox · Jayson Salkey · Julio Albinati · Andrew Gordon Wilson -
2019 Poster: A Simple Baseline for Bayesian Uncertainty in Deep Learning »
Wesley Maddox · Pavel Izmailov · Timur Garipov · Dmitry Vetrov · Andrew Gordon Wilson