Timezone: »
Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings.
We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training.
Author Information
Remi Cadene (Sorbonne University - LIP6)
Corentin Dancette (Sorbonne Université)
Hedi Ben younes (Université Pierre & Marie Curie / Heuritech)
Matthieu Cord (Sorbonne University)
Devi Parikh (Georgia Tech / Facebook AI Research (FAIR))
More from the Same Authors
-
2020 : Paper 16: Driving Behavior Explanation with Multi-level Fusion »
Matthieu Cord · Patrick Pérez -
2022 Poster: What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods »
Julien Colin · Thomas FEL · Remi Cadene · Thomas Serre -
2021 : AI for Augmenting Human Creativity »
Devi Parikh -
2021 : Career and Life: Panel Discussion - Bo Li, Adriana Romero-Soriano, Devi Parikh, and Emily Denton »
Emily Denton · Devi Parikh · Bo Li · Adriana Romero -
2021 Poster: RED : Looking for Redundancies for Data-FreeStructured Compression of Deep Neural Networks »
Edouard YVINEC · Arnaud Dapogny · Matthieu Cord · Kevin Bailly -
2021 Poster: Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis »
Thomas FEL · Remi Cadene · Mathieu Chalvidal · Matthieu Cord · David Vigouroux · Thomas Serre -
2021 : Open Catalyst Challenge + Q&A »
Abhishek Das · Muhammed Shuaibi · Siddharth Goyal · Adeesh Kolluru · Janice Lan · Aini Palizhati · Anuroop Sriram · Brandon Wood · Aditya Grover · Devi Parikh · Zachary Ulissi · Larry Zitnick -
2021 Poster: Human-Adversarial Visual Question Answering »
Sasha Sheng · Amanpreet Singh · Vedanuj Goswami · Jose Magana · Tristan Thrush · Wojciech Galuba · Devi Parikh · Douwe Kiela -
2020 Poster: Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data »
Michael Cogswell · Jiasen Lu · Rishabh Jain · Stefan Lee · Devi Parikh · Dhruv Batra -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2019 Poster: Cross-channel Communication Networks »
Jianwei Yang · Zhile Ren · Chuang Gan · Hongyuan Zhu · Devi Parikh -
2019 Poster: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks »
Jiasen Lu · Dhruv Batra · Devi Parikh · Stefan Lee -
2019 Poster: Zero-Shot Semantic Segmentation »
Maxime Bucher · Tuan-Hung VU · Matthieu Cord · Patrick Pérez -
2019 Poster: Addressing Failure Prediction by Learning Model Confidence »
Charles Corbière · Nicolas THOME · Avner Bar-Hen · Matthieu Cord · Patrick Pérez -
2019 Poster: Riemannian batch normalization for SPD neural networks »
Daniel Brooks · Olivier Schwander · Frederic Barbaresco · Jean-Yves Schneider · Matthieu Cord -
2019 Poster: Chasing Ghosts: Instruction Following as Bayesian State Tracking »
Peter Anderson · Ayush Shrivastava · Devi Parikh · Dhruv Batra · Stefan Lee -
2018 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Erik Wijmans · Samyak Datta · Ethan Perez · Mateusz Malinowski · Stefan Lee · Peter Anderson · Aaron Courville · Jeremie MARY · Dhruv Batra · Devi Parikh · Olivier Pietquin · Chiori HORI · Tim Marks · Anoop Cherian -
2018 Poster: Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection »
Taylor Mordan · Nicolas THOME · Gilles Henaff · Matthieu Cord -
2017 : Panel Discussion »
Felix Hill · Olivier Pietquin · Jack Gallant · Raymond Mooney · Sanja Fidler · Chen Yu · Devi Parikh -
2017 : Towards Embodied Question Answering »
Devi Parikh -
2017 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary -
2017 Poster: Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model »
Jiasen Lu · Anitha Kannan · Jianwei Yang · Devi Parikh · Dhruv Batra -
2016 Poster: Hierarchical Question-Image Co-Attention for Visual Question Answering »
Jiasen Lu · Jianwei Yang · Dhruv Batra · Devi Parikh -
2013 Poster: Top-Down Regularization of Deep Belief Networks »
Hanlin Goh · Nicolas Thome · Matthieu Cord · Joo-Hwee Lim -
2011 Poster: Understanding the Intrinsic Memorability of Images »
Phillip Isola · Devi Parikh · Antonio Torralba · Aude Oliva