Timezone: »
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributionsthat may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a \emph{threshold} on the model's confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceeds that threshold. ATC outperforms previous methods across several model architectures, types of distribution shifts (e.g., due to synthetic corruptions, dataset reproduction, or novel subpopulations), and datasets (\textsc{Wilds}-FMoW, ImageNet, \breeds, CIFAR, and MNIST). In our experiments, ATC estimates target performance $2\text{--}4\times$ more accurately than prior methods. We also explore the theoretical foundations of the problem, proving that, in general, identifying the accuracy is just as hard as identifying the optimal predictor and thus, the efficacy of any method rests upon (perhaps unstated) assumptions on the nature of the shift. Finally, analyzing our method on some toy distributions, we provide insights concerning when it works.
Author Information
Saurabh Garg (CMU)
Sivaraman Balakrishnan (Carnegie Mellon University)
Zachary Lipton (Carnegie Mellon University)
Behnam Neyshabur (Google)
I am a staff research scientist at Google. Before that, I was a postdoctoral researcher at New York University and a member of Theoretical Machine Learning program at Institute for Advanced Study (IAS) in Princeton. In summer 2017, I received a PhD in computer science at TTI-Chicago where I was fortunate to be advised by Nati Srebro.
Hanie Sedghi (Google Research)

I am a senior research scientist at Google Brain, where I lead the “Deep Phenomena” team. My approach is to bond theory and practice in large-scale machine learning by designing algorithms with theoretical guarantees that also work efficiently in practice. Over the recent years, I have been working on understanding and improving deep learning. Prior to Google, I was a Research Scientist at Allen Institute for Artificial Intelligence and before that, a postdoctoral fellow at UC Irvine. I received my PhD from University of Southern California with a minor in mathematics in 2015.
More from the Same Authors
-
2021 Spotlight: Mixture Proportion Estimation and PU Learning:A Modern Approach »
Saurabh Garg · Yifan Wu · Alexander Smola · Sivaraman Balakrishnan · Zachary Lipton -
2021 Spotlight: Efficient Online Estimation of Causal Effects by Deciding What to Observe »
Shantanu Gupta · Zachary Lipton · David Childers -
2021 Spotlight: Parametric Complexity Bounds for Approximating PDEs with Neural Networks »
Tanya Marwah · Zachary Lipton · Andrej Risteski -
2021 : Avoiding Spurious Correlations: Bridging Theory and Practice »
Thao Nguyen · Hanie Sedghi · Behnam Neyshabur -
2022 : Downstream Datasets Make Surprisingly Good Pretraining Corpora »
Kundan Krishna · Saurabh Garg · Jeffrey Bigham · Zachary Lipton -
2022 : Disentangling the Mechanisms Behind Implicit Regularization in SGD »
Zachary Novack · Simran Kaur · Tanya Marwah · Saurabh Garg · Zachary Lipton -
2022 : Teaching Algorithmic Reasoning via In-context Learning »
Hattie Zhou · Azade Nova · aaron courville · Hugo Larochelle · Behnam Neyshabur · Hanie Sedghi -
2022 : Deconstructing Distributions: A Pointwise Framework of Learning »
Gal Kaplun · Nikhil Ghosh · Saurabh Garg · Boaz Barak · Preetum Nakkiran -
2022 : RLSBench: A Large-Scale Empirical Study of Domain Adaptation Under Relaxed Label Shift »
Saurabh Garg · Nick Erickson · James Sharpnack · Alexander Smola · Sivaraman Balakrishnan · Zachary Lipton -
2023 Poster: Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift »
Saurabh Garg · Amrith Setlur · Zachary Lipton · Sivaraman Balakrishnan · Virginia Smith · Aditi Raghunathan -
2023 Poster: Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms »
Dheeraj Baby · Saurabh Garg · Tzu-Ching Yen · Sivaraman Balakrishnan · Zachary Lipton · Yu-Xiang Wang -
2023 Poster: (Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy »
Elan Rosenfeld · Saurabh Garg -
2023 Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo) »
Ananth Balashankar · Saurabh Garg · Jindong Gu · Amrith Setlur · Yao Qin · Aditi Raghunathan · Ahmad Beirami -
2022 : MATH-AI: Toward Human-Level Mathematical Reasoning »
Francois Charton · Noah Goodman · Behnam Neyshabur · Talia Ringer · Daniel Selsam -
2022 : Teaching Algorithmic Reasoning via In-context Learning »
Hattie Zhou · Azade Nova · aaron courville · Hugo Larochelle · Behnam Neyshabur · Hanie Sedghi -
2022 : Panel Discussion »
Behnam Neyshabur · David Sontag · Pradeep Ravikumar · Erin Hartman -
2022 : Length Generalization in Quantitative Reasoning »
Behnam Neyshabur -
2022 : Local Causal Discovery for Estimating Causal Effects »
Shantanu Gupta · David Childers · Zachary Lipton -
2022 Poster: Characterizing Datapoints via Second-Split Forgetting »
Pratyush Maini · Saurabh Garg · Zachary Lipton · J. Zico Kolter -
2022 Poster: Unsupervised Learning under Latent Label Shift »
Manley Roberts · Pranav Mani · Saurabh Garg · Zachary Lipton -
2022 Poster: Domain Adaptation under Open Set Label Shift »
Saurabh Garg · Sivaraman Balakrishnan · Zachary Lipton -
2022 Poster: Exploring Length Generalization in Large Language Models »
Cem Anil · Yuhuai Wu · Anders Andreassen · Aitor Lewkowycz · Vedant Misra · Vinay Ramasesh · Ambrose Slone · Guy Gur-Ari · Ethan Dyer · Behnam Neyshabur -
2022 Poster: Revisiting Neural Scaling Laws in Language and Vision »
Ibrahim Alabdulmohsin · Behnam Neyshabur · Xiaohua Zhai -
2022 Poster: Solving Quantitative Reasoning Problems with Language Models »
Aitor Lewkowycz · Anders Andreassen · David Dohan · Ethan Dyer · Henryk Michalewski · Vinay Ramasesh · Ambrose Slone · Cem Anil · Imanol Schlag · Theo Gutman-Solo · Yuhuai Wu · Behnam Neyshabur · Guy Gur-Ari · Vedant Misra -
2022 Poster: Block-Recurrent Transformers »
DeLesley Hutchins · Imanol Schlag · Yuhuai Wu · Ethan Dyer · Behnam Neyshabur -
2021 Poster: Efficient Online Estimation of Causal Effects by Deciding What to Observe »
Shantanu Gupta · Zachary Lipton · David Childers -
2021 Poster: Parametric Complexity Bounds for Approximating PDEs with Neural Networks »
Tanya Marwah · Zachary Lipton · Andrej Risteski -
2021 Poster: Mixture Proportion Estimation and PU Learning:A Modern Approach »
Saurabh Garg · Yifan Wu · Alexander Smola · Sivaraman Balakrishnan · Zachary Lipton -
2021 Poster: Deep Learning Through the Lens of Example Difficulty »
Robert Baldock · Hartmut Maennel · Behnam Neyshabur -
2021 Poster: Off-Policy Risk Assessment in Contextual Bandits »
Audrey Huang · Liu Leqi · Zachary Lipton · Kamyar Azizzadenesheli -
2021 Poster: Rebounding Bandits for Modeling Satiation Effects »
Liu Leqi · Fatma Kilinc Karzan · Zachary Lipton · Alan Montgomery -
2020 Poster: A Unified View of Label Shift Estimation »
Saurabh Garg · Yifan Wu · Sivaraman Balakrishnan · Zachary Lipton -
2020 Poster: On Learning Ising Models under Huber's Contamination Model »
Adarsh Prasad · Vishwak Srinivasan · Sivaraman Balakrishnan · Pradeep Ravikumar -
2020 Poster: What is being transferred in transfer learning? »
Behnam Neyshabur · Hanie Sedghi · Chiyuan Zhang -
2020 Poster: Towards Learning Convolutions from Scratch »
Behnam Neyshabur -
2019 : Panel - The Role of Communication at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez »
Aparna Lakshmiratan · Finale Doshi-Velez · Surya Ganguli · Zachary Lipton · Michela Paganini · Anima Anandkumar · Jason Yosinski -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates »
Yining Wang · Sivaraman Balakrishnan · Aarti Singh -
2017 : Contributed talk 1 - A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks »
Behnam Neyshabur -
2017 Poster: Exploring Generalization in Deep Learning »
Behnam Neyshabur · Srinadh Bhojanapalli · David Mcallester · Nati Srebro -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro