Timezone: »
Multimodal few-shot learning is challenging due to the large domain gap between vision and language modalities. As an effort to bridge this gap, we introduce a meta-learning approach for multimodal few-shot learning, to leverage its strong ability of accruing knowledge across tasks. The full model is based on frozen foundation vision and language models to use their already learned capacity. To translate the visual features into the latent space of the language model, we introduce a light-weight meta-mapper, acting as a meta-learner. By updating only the parameters of the meta-mapper, our model learns to quickly adapt to unseen samples with only a few gradient updates. Unlike prior multimodal few-shot learners, which need a hand-engineered task induction, our model is able to induce the task in a completely data-driven manner. The experiments on recent multimodal few-shot benchmarks demonstrate that our meta-learning approach yields better multimodal few-shot learners while being computationally more efficient compared to its counterparts.
Author Information
Ivona Najdenkoska (University of Amsterdam)
Xiantong Zhen (United Imaging Healthcare)
Marcel Worring (University of Amsterdam)
More from the Same Authors
-
2022 Poster: Association Graph Learning for Multi-Task Classification with Category Shifts »
Jiayi Shen · Zehao Xiao · Xiantong Zhen · Cees Snoek · Marcel Worring -
2022 Poster: Variational Model Perturbation for Source-Free Domain Adaptation »
Mengmeng Jing · Xiantong Zhen · Jingjing Li · Cees Snoek -
2021 Poster: Learning to Learn Dense Gaussian Processes for Few-Shot Learning »
Ze Wang · Zichen Miao · Xiantong Zhen · Qiang Qiu -
2021 Poster: Variational Multi-Task Learning with Gumbel-Softmax Priors »
Jiayi Shen · Xiantong Zhen · Marcel Worring · Ling Shao -
2020 Poster: Learning to Learn Variational Semantic Memory »
Xiantong Zhen · Yingjun Du · Huan Xiong · Qiang Qiu · Cees Snoek · Ling Shao