Timezone: »
Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF). A key limitation in prior implementations of NMN is that the neural modules do not effectively capture the association between the visual input and the relevant neighbourhood context of the textual input. This limits their generalizability. For instance, NMN fail to understand new concepts such as “yellow sphere to the left" even when it is a combination of known concepts from train data: “blue sphere", “yellow cube", and “metallic cube to the left". In this paper, we address this limitation by introducing a language-guided adaptive convolution layer (LG-Conv) into NMN, in which the filter weights of convolutions are explicitly multiplied with a spatially varying language-guided kernel. Our model allows the neural module to adaptively co-attend over potential objects of interest from the visual and textual inputs. Extensive experiments on VQA and REF tasks demonstrate the effectiveness of our approach. Additionally, we propose a new challenging out-of-distribution test split for REF task, which we call C3-Ref+, for explicitly evaluating the NMN’s ability to generalize well to adversarial perturbations and unseen combinations of known concepts. Experiments on C3-Ref+ further demonstrate the generalization capabilities of our approach.
Author Information
Arjun Akula (University of California, Los Angeles)
Varun Jampani (Google)
Soravit Changpinyo (University of Southern California (USC))
Song-Chun Zhu (UCLA)
More from the Same Authors
-
2021 Spotlight: ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction »
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan -
2021 : IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning »
Pan Lu · Liang Qiu · Jiaqi Chen · Tanglin Xia · Yizhou Zhao · Wei Zhang · Zhou Yu · Xiaodan Liang · Song-Chun Zhu -
2022 Poster: LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery »
Chun-Han Yao · Wei-Chih Hung · Yuanzhen Li · Michael Rubinstein · Ming-Hsuan Yang · Varun Jampani -
2022 Poster: SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections »
Mark Boss · Andreas Engelhardt · Abhishek Kar · Yuanzhen Li · Deqing Sun · Jonathan Barron · Hendrik PA Lensch · Varun Jampani -
2022 Poster: Subsidiary Prototype Alignment for Universal Domain Adaptation »
Jogendra Nath Kundu · Suvaansh Bhambri · Akshay R Kulkarni · Hiran Sarkar · Varun Jampani · Venkatesh Babu R -
2022 Poster: Polynomial Neural Fields for Subband Decomposition and Manipulation »
Guandao Yang · Sagie Benaim · Varun Jampani · Kyle Genova · Jonathan Barron · Thomas Funkhouser · Bharath Hariharan · Serge Belongie -
2021 : Solving Math Problems by Joint Parsing and Cognitive Reasoning »
Song-Chun Zhu -
2021 Poster: On Path Integration of Grid Cells: Group Representation and Isotropic Scaling »
Ruiqi Gao · Jianwen Xie · Xue-Xin Wei · Song-Chun Zhu · Ying Nian Wu -
2021 Poster: Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition »
Mark Boss · Varun Jampani · Raphael Braun · Ce Liu · Jonathan Barron · Hendrik PA Lensch -
2021 Poster: ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction »
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan -
2021 Poster: Iterative Teacher-Aware Learning »
Luyao Yuan · Dongruo Zhou · Junhong Shen · Jingdong Gao · Jeffrey L Chen · Quanquan Gu · Ying Nian Wu · Song-Chun Zhu -
2021 Poster: Unsupervised Foreground Extraction via Deep Region Competition »
Peiyu Yu · Sirui Xie · Xiaojian (Shawn) Ma · Yixin Zhu · Ying Nian Wu · Song-Chun Zhu -
2021 Poster: On Model Calibration for Long-Tailed Object Detection and Instance Segmentation »
Tai-Yu Pan · Cheng Zhang · Yandong Li · Hexiang Hu · Dong Xuan · Soravit Changpinyo · Boqing Gong · Wei-Lun Chao -
2021 Poster: Non-local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation »
Jogendra Nath Kundu · Siddharth Seth · Anirudh Jamkhandi · Pradyumna YM · Varun Jampani · Anirban Chakraborty · Venkatesh Babu R -
2021 Poster: Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery »
Ramesha Rakesh Mugaludi · Jogendra Nath Kundu · Varun Jampani · Venkatesh Babu R -
2020 Poster: Generative View Synthesis: From Single-view Semantics to Novel-view Images »
Tewodros Amberbir Habtegebrial · Varun Jampani · Orazio Gallo · Didier Stricker -
2013 Poster: Similarity Component Analysis »
Soravit Changpinyo · Kuan Liu · Fei Sha