Timezone: »
Recently, the accuracy of image-text matching has been greatly improved by multimodal pretrained models, all of which are trained on millions or billions of paired images and texts. Different from them, this paper studies a new scenario as unpaired image-text matching, in which paired images and texts are assumed to be unavailable during model training. To deal with this, we propose a simple yet effective method namely Multimodal Aligned Conceptual Knowledge (MACK), which is inspired by the knowledge use in human brain. It can be directly used as general knowledge to correlate images and texts even without model training, or further fine-tuned based on unpaired images and texts to better generalize to certain datasets. In addition, we extend it as a re-ranking method, which can be easily combined with existing image-text matching models to substantially improve their performance.
Author Information
Yan Huang (CRIPAC, CASIA)
Yuming Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Yunan Zeng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Liang Wang (NLPR, China)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching »
Thu. Dec 8th 05:00 -- 07:00 PM Room
More from the Same Authors
-
2021 Poster: Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision »
Keji He · Yan Huang · Qi Wu · Jianhua Yang · Dong An · Shuanglin Sima · Liang Wang -
2020 Poster: Unfolding the Alternating Optimization for Blind Super Resolution »
zhengxiong luo · Yan Huang · Shang Li · Liang Wang · Tieniu Tan -
2015 Poster: Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution »
Yan Huang · Wei Wang · Liang Wang -
2013 Poster: Relevance Topic Model for Unstructured Social Group Activity Recognition »
Fang Zhao · Yongzhen Huang · Liang Wang · Tieniu Tan