Timezone: »
Many practical settings allow a learner to defer predictions to one or more costly experts. For example, the learning to defer paradigm allows a learner to defer to a human expert, at some monetary cost. Similarly, the adaptive inference paradigm allows a base model to defer to one or more large models, at some computational cost. The goal in these settings is to learn classification and deferral mechanisms to optimise a suitable accuracy-cost tradeoff. To achieve this, a central issue studied in prior work is the design of a coherent loss function for both mechanisms. In this work, we demonstrate that existing losses have two subtle limitations: they can encourage underfitting when there is a high cost of deferring, and the deferral function can have a weak dependence on the base model predictions. To resolve these issues, we propose a post-hoc training scheme: we train a deferral function on top of a base model, with the objective of predicting to defer when the base model's error probability exceeds the cost of the expert model. This may be viewed as applying a partial surrogate to the ideal deferral loss, which can lead to a tighter approximation and thus better performance. Empirically, we verify the efficacy of post-hoc training on benchmarks for learning to defer and adaptive inference.
Author Information
Harikrishna Narasimhan (Google Research)
Wittawat Jitkrittum (Google Research)
Aditya Menon (Google)
Ankit Rawat (Google Research)
Sanjiv Kumar (Google Research)
More from the Same Authors
-
2021 : An Empirical Study of Pre-trained Models on Out-of-distribution Generalization »
Yaodong Yu · Heinrich Jiang · Dara Bahri · Hossein Mobahi · Seungyeon Kim · Ankit Rawat · Andreas Veit · Yi Ma -
2022 : Effect of mixup Training on Representation Learning »
Arslan Chaudhry · Aditya Menon · Andreas Veit · Sadeep Jayasumana · Srikumar Ramalingam · Sanjiv Kumar -
2023 Poster: SOAR: Improved Quantization for Nearest Neighbor Search »
Philip Sun · David Simcha · Dave Dopson · Ruiqi Guo · Sanjiv Kumar -
2023 Poster: ResMem: Learn what you can and memorize the rest »
Zitong Yang · MICHAL LUKASIK · Vaishnavh Nagarajan · Zonglin Li · Ankit Rawat · Manzil Zaheer · Aditya Menon · Sanjiv Kumar -
2023 Poster: On student-teacher deviations in distillation: does it pay to disobey? »
Vaishnavh Nagarajan · Aditya Menon · Srinadh Bhojanapalli · Hossein Mobahi · Sanjiv Kumar -
2023 Poster: When Does Confidence-Based Cascade Deferral Suffice? »
Wittawat Jitkrittum · Neha Gupta · Aditya Menon · Harikrishna Narasimhan · Ankit Rawat · Sanjiv Kumar -
2022 Poster: TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s »
Felix Chern · Blake Hechtman · Andy Davis · Ruiqi Guo · David Majnemer · Sanjiv Kumar -
2022 Poster: Decoupled Context Processing for Context Augmented Language Modeling »
Zonglin Li · Ruiqi Guo · Sanjiv Kumar -
2022 Poster: A Fourier Approach to Mixture Learning »
Mingda Qiao · Guru Guruganesh · Ankit Rawat · Kumar Avinava Dubey · Manzil Zaheer -
2021 Poster: Batch Active Learning at Scale »
Gui Citovsky · Giulia DeSalvo · Claudio Gentile · Lazaros Karydas · Anand Rajagopalan · Afshin Rostamizadeh · Sanjiv Kumar -
2021 Poster: Training Over-parameterized Models with Non-decomposable Objectives »
Harikrishna Narasimhan · Aditya Menon -
2021 Poster: Efficient Training of Retrieval Models using Negative Cache »
Erik Lindgren · Sashank Reddi · Ruiqi Guo · Sanjiv Kumar -
2020 Poster: Learning Kernel Tests Without Data Splitting »
Jonas Kübler · Wittawat Jitkrittum · Bernhard Schölkopf · Krikamol Muandet -
2020 Poster: Approximate Heavily-Constrained Learning with Lagrange Multiplier Models »
Harikrishna Narasimhan · Andrew Cotter · Yichen Zhou · Serena Wang · Wenshuo Guo -
2020 Poster: Why are Adaptive Methods Good for Attention Models? »
Jingzhao Zhang · Sai Praneeth Karimireddy · Andreas Veit · Seungyeon Kim · Sashank Reddi · Sanjiv Kumar · Suvrit Sra -
2020 Poster: Fair Performance Metric Elicitation »
Gaurush Hiranandani · Harikrishna Narasimhan · Sanmi Koyejo -
2020 Poster: Consistent Plug-in Classifiers for Complex Objectives and Constraints »
Shiv Kumar Tavker · Harish Guruprasad Ramaswamy · Harikrishna Narasimhan -
2020 Poster: Robust Optimization for Fairness with Noisy Protected Groups »
Serena Wang · Wenshuo Guo · Harikrishna Narasimhan · Andrew Cotter · Maya Gupta · Michael Jordan -
2020 Poster: Multi-Stage Influence Function »
Hongge Chen · Si Si · Yang Li · Ciprian Chelba · Sanjiv Kumar · Duane Boning · Cho-Jui Hsieh -
2020 Poster: O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers »
Chulhee Yun · Yin-Wen Chang · Srinadh Bhojanapalli · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2020 Poster: Robust large-margin learning in hyperbolic space »
Melanie Weber · Manzil Zaheer · Ankit Singh Rawat · Aditya Menon · Sanjiv Kumar -
2020 Poster: Learning discrete distributions: user vs item-level privacy »
Yuhan Liu · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Michael D Riley -
2019 Poster: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces »
Chuan Guo · Ali Mousavi · Xiang Wu · Daniel Holtmann-Rice · Satyen Kale · Sashank Reddi · Sanjiv Kumar -
2019 Poster: Optimizing Generalized Rate Metrics with Three Players »
Harikrishna Narasimhan · Andrew Cotter · Maya Gupta -
2019 Poster: Noise-tolerant fair classification »
Alex Lamy · Ziyuan Zhong · Aditya Menon · Nakul Verma -
2019 Oral: Optimizing Generalized Rate Metrics with Three Players »
Harikrishna Narasimhan · Andrew Cotter · Maya Gupta -
2019 Poster: On Making Stochastic Classifiers Deterministic »
Andrew Cotter · Maya Gupta · Harikrishna Narasimhan -
2019 Poster: Multilabel reductions: what is my loss optimising? »
Aditya Menon · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2019 Spotlight: Multilabel reductions: what is my loss optimising? »
Aditya Menon · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2019 Oral: On Making Stochastic Classifiers Deterministic »
Andrew Cotter · Maya Gupta · Harikrishna Narasimhan -
2019 Poster: Sampled Softmax with Random Fourier Features »
Ankit Singh Rawat · Jiecao Chen · Felix Xinnan Yu · Ananda Theertha Suresh · Sanjiv Kumar -
2019 Poster: Kernel Stein Tests for Multiple Model Comparison »
Jen Ning Lim · Makoto Yamada · Bernhard Schölkopf · Wittawat Jitkrittum -
2019 Poster: Fisher Efficient Inference of Intractable Models »
Song Liu · Takafumi Kanamori · Wittawat Jitkrittum · Yu Chen -
2019 Tutorial: Interpretable Comparison of Distributions and Models »
Wittawat Jitkrittum · Danica J. Sutherland · Arthur Gretton -
2018 Poster: Informative Features for Model Comparison »
Wittawat Jitkrittum · Heishiro Kanagawa · Patsorn Sangkloy · James Hays · Bernhard Schölkopf · Arthur Gretton -
2018 Poster: Adaptive Methods for Nonconvex Optimization »
Manzil Zaheer · Sashank Reddi · Devendra S Sachan · Satyen Kale · Sanjiv Kumar -
2018 Poster: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2018 Spotlight: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 : Now Playing: Continuous low-power music recognition »
Marvin Ritter · Ruiqi Guo · Sanjiv Kumar · Julian J Odell · Mihajlo Velimirović · Dominik Roblek · James Lyon -
2017 : A Linear-Time Kernel Goodness-of-Fit Test (NIPS best paper) »
Wittawat Jitkrittum -
2017 Poster: A Linear-Time Kernel Goodness-of-Fit Test »
Wittawat Jitkrittum · Wenkai Xu · Zoltan Szabo · Kenji Fukumizu · Arthur Gretton -
2017 Poster: Multiscale Quantization for Fast Similarity Search »
Xiang Wu · Ruiqi Guo · Ananda Theertha Suresh · Sanjiv Kumar · Daniel Holtmann-Rice · David Simcha · Felix Yu -
2017 Oral: A Linear-Time Kernel Goodness-of-Fit Test »
Wittawat Jitkrittum · Wenkai Xu · Zoltan Szabo · Kenji Fukumizu · Arthur Gretton -
2016 Oral: Interpretable Distribution Features with Maximum Testing Power »
Wittawat Jitkrittum · Zoltán Szabó · Kacper P Chwialkowski · Arthur Gretton -
2016 Poster: Orthogonal Random Features »
Felix Xinnan Yu · Ananda Theertha Suresh · Krzysztof M Choromanski · Daniel Holtmann-Rice · Sanjiv Kumar -
2016 Poster: Interpretable Distribution Features with Maximum Testing Power »
Wittawat Jitkrittum · Zoltán Szabó · Kacper P Chwialkowski · Arthur Gretton -
2016 Oral: Orthogonal Random Features »
Felix Xinnan Yu · Ananda Theertha Suresh · Krzysztof M Choromanski · Daniel Holtmann-Rice · Sanjiv Kumar -
2015 Workshop: The 1st International Workshop "Feature Extraction: Modern Questions and Challenges" »
Dmitry Storcheus · Sanjiv Kumar · Afshin Rostamizadeh -
2015 Poster: Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM) »
Mijung Park · Wittawat Jitkrittum · Ahmad Qamar · Zoltan Szabo · Lars Buesing · Maneesh Sahani -
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Poster: Structured Transforms for Small-Footprint Deep Learning »
Vikas Sindhwani · Tara Sainath · Sanjiv Kumar -
2015 Spotlight: Structured Transforms for Small-Footprint Deep Learning »
Vikas Sindhwani · Tara Sainath · Sanjiv Kumar -
2014 Session: Oral Session 8 »
Sanjiv Kumar -
2014 Poster: Discrete Graph Hashing »
Wei Liu · Cun Mu · Sanjiv Kumar · Shih-Fu Chang -
2014 Spotlight: Discrete Graph Hashing »
Wei Liu · Cun Mu · Sanjiv Kumar · Shih-Fu Chang -
2012 Poster: Angular Quantization based Binary Codes for Fast Similarity Search »
Yunchao Gong · Sanjiv Kumar · Vishal Verma · Svetlana Lazebnik -
2009 Poster: Ensemble Nystrom Method »
Sanjiv Kumar · Mehryar Mohri · Ameet S Talwalkar