Timezone: »
Several works have proposed Simplicity Bias (SB)---the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why neural networks generalize well [Arpit et al. 2017, Nakkiran et al. 2019, Valle-Perez et al. 2019]. However, the precise notion of simplicity remains vague. Furthermore, previous settings [Soudry et al. 2018, Gunasekar et al. 2018] that use SB to theoretically justify why neural networks generalize well do not simultaneously capture the non-robustness of neural networks---a widely observed phenomenon in practice [Goodfellow et al. 2014, Jo and Bengio 2017]. We attempt to reconcile SB and the superior standard generalization of neural networks with the non-robustness observed in practice by introducing piecewise-linear and image-based datasets, which (a) incorporate a precise notion of simplicity, (b) comprise multiple predictive features with varying levels of simplicity, and (c) capture the non-robustness of neural networks trained on real data. Using theory and empirics on these datasets, we make four observations: (i) SB of SGD and variants can be extreme: neural networks can exclusively rely on the simplest feature and remain invariant to all predictive complex features. (ii) The extreme aspect of SB could explain why seemingly benign distribution shifts and small adversarial perturbations significantly degrade model performance. (iii) Contrary to conventional wisdom, SB can also hurt generalization on the same data distribution, as SB persists even when the simplest feature has less predictive power than the more complex features. (iv) Common approaches to improve generalization and robustness---ensembles and adversarial training---can fail in mitigating SB and its pitfalls. Given the role of SB in training neural networks, we hope that the proposed datasets and methods serve as an effective testbed to evaluate novel algorithmic approaches aimed at avoiding the pitfalls of SB.
Author Information
Harshay Shah (Microsoft Research)
Kaustav Tamuly (Microsoft Research)
Aditi Raghunathan (Stanford University)
Prateek Jain (Microsoft Research)
Praneeth Netrapalli (Microsoft Research)
More from the Same Authors
-
2021 Spotlight: Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems »
Suhas Kowshik · Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2021 Spotlight: Differentially Private Model Personalization »
Prateek Jain · John Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : Calibrated Ensembles: A Simple Way to Mitigate ID-OOD Accuracy Tradeoffs »
Ananya Kumar · Aditi Raghunathan · Tengyu Ma · Percy Liang -
2022 : MET: Masked Encoding for Tabular Data »
Kushal Majmundar · Sachin Goyal · Praneeth Netrapalli · Prateek Jain -
2022 : A Unified Framework for Comparing Learning Algorithms »
Harshay Shah · Sung Min Park · Andrew Ilyas · Aleksander Madry -
2022 : Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks »
Sravanti Addepalli · Anshul Nasery · Venkatesh Babu R · Praneeth Netrapalli · Prateek Jain -
2022 Poster: DP-PCA: Statistically Optimal and Differentially Private PCA »
Xiyang Liu · Weihao Kong · Prateek Jain · Sewoong Oh -
2022 Poster: S3GC: Scalable Self-Supervised Graph Clustering »
Fnu Devvrit · Aditya Sinha · Inderjit Dhillon · Prateek Jain -
2022 Poster: Reproducibility in Optimization: Theoretical Framework and Limits »
Kwangjun Ahn · Prateek Jain · Ziwei Ji · Satyen Kale · Praneeth Netrapalli · Gil I Shamir -
2022 Poster: Matryoshka Representation Learning »
Aditya Kusupati · Gantavya Bhatt · Aniket Rege · Matthew Wallingford · Aditya Sinha · Vivek Ramanujan · William Howard-Snyder · Kaifeng Chen · Sham Kakade · Prateek Jain · Ali Farhadi -
2021 : Invited Talk: Lessons from robust machine learning »
Aditi Raghunathan -
2021 Poster: Differentially Private Model Personalization »
Prateek Jain · John Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 Poster: Streaming Linear System Identification with Reverse Experience Replay »
Suhas Kowshik · Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2021 Poster: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes »
Aditya Kusupati · Matthew Wallingford · Vivek Ramanujan · Raghav Somani · Jae Sung Park · Krishna Pillutla · Prateek Jain · Sham Kakade · Ali Farhadi -
2021 Poster: Do Input Gradients Highlight Discriminative Features? »
Harshay Shah · Prateek Jain · Praneeth Netrapalli -
2021 Poster: Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems »
Suhas Kowshik · Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2021 Poster: Statistically and Computationally Efficient Linear Meta-representation Learning »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2020 Poster: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2020 Spotlight: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2020 Poster: RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference »
Oindrila Saha · Aditya Kusupati · Harsha Vardhan Simhadri · Manik Varma · Prateek Jain -
2020 Spotlight: RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference »
Oindrila Saha · Aditya Kusupati · Harsha Vardhan Simhadri · Manik Varma · Prateek Jain -
2020 Poster: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games »
Arun Suggala · Praneeth Netrapalli -
2020 Spotlight: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming »
Sumanth Dathathri · Krishnamurthy Dvijotham · Alexey Kurakin · Aditi Raghunathan · Jonathan Uesato · Rudy Bunel · Shreya Shankar · Jacob Steinhardt · Ian Goodfellow · Percy Liang · Pushmeet Kohli -
2020 Poster: MOReL: Model-Based Offline Reinforcement Learning »
Rahul Kidambi · Aravind Rajeswaran · Praneeth Netrapalli · Thorsten Joachims -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : Contributed talk: What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? »
Praneeth Netrapalli -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Efficient Algorithms for Smooth Minimax Optimization »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2019 Poster: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares »
Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli -
2019 Poster: Unlabeled Data Improves Adversarial Robustness »
Yair Carmon · Aditi Raghunathan · Ludwig Schmidt · John Duchi · Percy Liang -
2018 Poster: Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds »
Raghav Somani · Chirag Gupta · Prateek Jain · Praneeth Netrapalli -
2018 Spotlight: Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds »
Raghav Somani · Chirag Gupta · Prateek Jain · Praneeth Netrapalli -
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang -
2017 Poster: Learning Mixture of Gaussians with Streaming Data »
Aditi Raghunathan · Prateek Jain · Ravishankar Krishnawamy