Timezone: »
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks and their zero-shot generalization ability is particularly exciting. While the top-5 zero-shot accuracies of these models are very high, the top-1 accuracies are much lower (over 25% gap in some cases). We investigate the reason for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. First,we develop a simple and efficient zero-shot post-hoc method to identify images where the top-1 prediction is likely to be incorrect, by measuring consistency of the predictions w.r.t. multiple prompts and image transformations. We show that our procedure better predicts mistakes, outperforming the popular max logit baseline on selective prediction tasks. Next, we propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy; specifically we use information from parents in the hierarchy to add superclass to prompts, and use information from children in the hierarchy to devise fine-grained prompts. We conduct experiments on both CLIP and LiT models with five different ImageNet-based datasets. For CLIP, our method improves the top-1 accuracy by 17.13% on the uncertain subset and 3.6% on the entire ImageNet validation set. We also show that our method consistently improvement on other ImageNet shifted datasets and other model architectures such as LiT. Our proposed method is hyperparameter-free, requires no additional model training and can be easily scaled to other large multi-modal architectures. Code for our experiments is opensourced at link hidden for anonymity.
Author Information
Yunhao Ge (University of Southern California)
I am now a 2nd-year PhD student in iLab at University of Southern University(USC), working with Prof. Laurent Itti. My research interests lie in Computer vision, Robotics and General AI. Currently, I am focusing on simulating baby learning (Reasoning, Attention, Imagination) by using various learning algorithms (Representation Learning, Adversarial Learning, Meta Learning, GNN, Reinforcement Learning, etc.).
Jie Ren (Google Inc.)
Ming-Hsuan Yang (Google)
Yuxiao Wang (Google Research)
Andrew Gallagher (Google)
Hartwig Adam (Google)
Laurent Itti (University of Southern California (USC))
Balaji Lakshminarayanan (Google Brain)
Balaji Lakshminarayanan is a research scientist at Google Brain. Prior to that, he was a research scientist at DeepMind. He received his PhD from the Gatsby Unit, University College London where he worked with Yee Whye Teh. His recent research has focused on probabilistic deep learning, specifically, uncertainty estimation, out-of-distribution robustness and deep generative models. Notable contributions relevant to the tutorial include developing state-of-the-art methods for calibration under dataset shift (such as deep ensembles and AugMix) and showing that deep generative models do not always know what they don't know. He has co-organized several workshops on "Uncertainty and Robustness in deep learning" and served as Area Chair for NeurIPS, ICML, ICLR and AISTATS.
Jiaping Zhao (Google Inc.)
More from the Same Authors
-
2021 : STEP: Segmenting and Tracking Every Pixel »
Mark Weber · Jun Xie · Maxwell Collins · Yukun Zhu · Paul Voigtlaender · Hartwig Adam · Bradley Green · Andreas Geiger · Bastian Leibe · Daniel Cremers · Aljosa Osep · Laura Leal-Taixé · Liang-Chieh Chen -
2022 : Out-of-Distribution Detection and Selective Generation for Conditional Language Models »
Jie Ren · Jiaming Luo · Yao Zhao · Kundan Krishna · Mohammad Saleh · Balaji Lakshminarayanan · Peter Liu -
2022 : Reliability benchmarks for image segmentation »
Estefany Kelly Buchanan · Michael Dusenberry · Jie Ren · Kevin Murphy · Balaji Lakshminarayanan · Dustin Tran -
2022 : Pushing the Accuracy-Fairness Tradeoff Frontier with Introspective Self-play »
Jeremiah Liu · Krishnamurthy Dvijotham · Jihyeon Lee · Quan Yuan · Martin Strobel · Balaji Lakshminarayanan · Deepak Ramachandran -
2022 : Improving the Robustness of Conditional Language Models by Detecting and Removing Input Noise »
Kundan Krishna · Yao Zhao · Jie Ren · Balaji Lakshminarayanan · Jiaming Luo · Mohammad Saleh · Peter Liu -
2023 Poster: 3D Copy-Paste: Physical Plausible Indoor Object Insertion for Monocular 3D Object Detection »
Yunhao Ge · Hong-Xing Yu · Cheng Zhao · Yuliang Guo · Xinyu Huang · Liu Ren · Laurent Itti · Jiajun Wu -
2023 Poster: RoboCLIP: One Demonstration is Enough to Learn Robot Policies »
Sumedh Sontakke · Séb Arnold · Jesse Zhang · Karl Pertsch · Erdem Bıyık · Dorsa Sadigh · Chelsea Finn · Laurent Itti -
2022 : Out-of-Distribution Detection and Selective Generation for Conditional Language Models »
Jie Ren · Jiaming Luo · Yao Zhao · Kundan Krishna · Mohammad Saleh · Balaji Lakshminarayanan · Peter Liu -
2022 Poster: Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation »
Yao Qin · Chiyuan Zhang · Ting Chen · Balaji Lakshminarayanan · Alex Beutel · Xuezhi Wang -
2020 Tutorial: (Track2) Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning Q&A »
Dustin Tran · Balaji Lakshminarayanan · Jasper Snoek -
2020 Tutorial: (Track2) Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning »
Dustin Tran · Balaji Lakshminarayanan · Jasper Snoek -
2019 Poster: Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift »
Jasper Snoek · Yaniv Ovadia · Emily Fertig · Balaji Lakshminarayanan · Sebastian Nowozin · D. Sculley · Joshua Dillon · Jie Ren · Zachary Nado -
2018 Poster: Searching for Efficient Multi-Scale Architectures for Dense Image Prediction »
Liang-Chieh Chen · Maxwell Collins · Yukun Zhu · George Papandreou · Barret Zoph · Florian Schroff · Hartwig Adam · Jonathon Shlens -
2017 : Google Lens »
Hartwig Adam -
2017 : Poster session (and Coffee Break) »
Jacob Andreas · Kun Li · Conner Vercellino · Thomas Miconi · Wenpeng Zhang · Luca Franceschi · Zheng Xiong · Karim Ahmed · Laurent Itti · Tim Klinger · Mostafa Rohaninejad -
2017 Poster: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles »
Balaji Lakshminarayanan · Alexander Pritzel · Charles Blundell -
2017 Spotlight: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles »
Balaji Lakshminarayanan · Alexander Pritzel · Charles Blundell -
2013 Poster: Bayesian optimization explains human active search »
Ali Borji · Laurent Itti