Timezone: »
Reasoning about entities and their relationships from multimodal data is a key goal of Artificial General Intelligence. The visual question answering (VQA) problem is an excellent way to test such reasoning capabilities of an AI model and its multimodal representation learning. However, the current VQA models are over-simplified deep neural networks, comprised of a long short-term memory (LSTM) unit for question comprehension and a convolutional neural network (CNN) for learning single image representation. We argue that the single visual representation contains a limited and general information about the image contents and thus limits the model reasoning capabilities. In this work we introduce a modular neural network model that learns a multimodal and multifaceted representation of the image and the question. The proposed model learns to use the multimodal representation to reason about the image entities and achieves a new state-of-the-art performance on both VQA benchmark datasets, VQA v1.0 and v2.0, by a wide margin.
Author Information
Ilija Ilievski (National University of Singapore)
Ilija is a machine learning researcher building holistic models of unstructured data from multiple modalities. His diverse, six-year experience as a machine learning researcher includes projects on combing satellite images and census data for complex city models, utilizing movie metadata and watch statistics for recommender systems, and fusing image and text data representations for visual question answering. Currently Ilija is working on developing a unified model of financial data coming from multiple sources applied to portfolio optimization.
Jiashi Feng (National University of Singapore)
More from the Same Authors
-
2021 Workshop: Distribution shifts: connecting methods and applications (DistShift) »
Shiori Sagawa · Pang Wei Koh · Fanny Yang · Hongseok Namkoong · Jiashi Feng · Kate Saenko · Percy Liang · Sarah Bird · Sergey Levine -
2020 Poster: Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning »
Pan Zhou · Jiashi Feng · Chao Ma · Caiming Xiong · Steven Chu Hong Hoi · Weinan E -
2020 Poster: Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts »
Guilin Li · Junlei Zhang · Yunhe Wang · Chuanjian Liu · Matthias Tan · Yunfeng Lin · Wei Zhang · Jiashi Feng · Tong Zhang -
2020 Poster: Improving Generalization in Reinforcement Learning with Mixture Regularization »
KAIXIN WANG · Bingyi Kang · Jie Shao · Jiashi Feng -
2020 Poster: Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation »
Jianfeng Zhang · Xuecheng Nie · Jiashi Feng -
2020 Poster: ConvBERT: Improving BERT with Span-based Dynamic Convolution »
Zi-Hang Jiang · Weihao Yu · Daquan Zhou · Yunpeng Chen · Jiashi Feng · Shuicheng Yan -
2020 Spotlight: ConvBERT: Improving BERT with Span-based Dynamic Convolution »
Zi-Hang Jiang · Weihao Yu · Daquan Zhou · Yunpeng Chen · Jiashi Feng · Shuicheng Yan -
2019 Poster: Efficient Meta Learning via Minibatch Proximal Update »
Pan Zhou · Xiaotong Yuan · Huan Xu · Shuicheng Yan · Jiashi Feng -
2019 Spotlight: Efficient Meta Learning via Minibatch Proximal Update »
Pan Zhou · Xiaotong Yuan · Huan Xu · Shuicheng Yan · Jiashi Feng -
2018 Poster: New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity »
Pan Zhou · Xiaotong Yuan · Jiashi Feng -
2018 Poster: Efficient Stochastic Gradient Hard Thresholding »
Pan Zhou · Xiaotong Yuan · Jiashi Feng -
2018 Poster: A^2-Nets: Double Attention Networks »
Yunpeng Chen · Yannis Kalantidis · Jianshu Li · Shuicheng Yan · Jiashi Feng -
2017 Poster: Dual Path Networks »
Yunpeng Chen · Jianan Li · Huaxin Xiao · Xiaojie Jin · Shuicheng Yan · Jiashi Feng -
2017 Spotlight: Dual Path Networks »
Yunpeng Chen · Jianan Li · Huaxin Xiao · Xiaojie Jin · Shuicheng Yan · Jiashi Feng -
2017 Poster: Predicting Scene Parsing and Motion Dynamics in the Future »
Xiaojie Jin · Huaxin Xiao · Xiaohui Shen · Jimei Yang · Zhe Lin · Yunpeng Chen · Zequn Jie · Jiashi Feng · Shuicheng Yan -
2017 Poster: Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis »
Jian Zhao · Lin Xiong · Panasonic Karlekar Jayashree · Jianshu Li · Fang Zhao · Zhecan Wang · Panasonic Sugiri Pranata · Panasonic Shengmei Shen · Shuicheng Yan · Jiashi Feng -
2016 Poster: Tree-Structured Reinforcement Learning for Sequential Object Localization »
Zequn Jie · Xiaodan Liang · Jiashi Feng · Xiaojie Jin · Wen Lu · Shuicheng Yan