Timezone: »

K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #230

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, based on the broad concept coverage achieved through large-scale data collection process. Alternatively, we argue that learning with external knowledge about images is a promising way which leverages a much more structured source of supervision and offers sample efficiency. In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is released at https://github.com/microsoft/klite.

Author Information

Sheng Shen (University of California Berkeley)
Chunyuan Li (Microsoft Research, Redmond)
Xiaowei Hu (University of Alberta)
Yujia Xie (Georgia Institute of Technology)
Jianwei Yang (Microsoft Research)
Pengchuan Zhang (California Institute of Technology)
Zhe Gan (Microsoft)
Lijuan Wang
Lu Yuan (Microsoft)
Ce Liu (Microsoft)
Kurt Keutzer (EECS, UC Berkeley)
Trevor Darrell (Electrical Engineering & Computer Science Department)
Anna Rohrbach (UC Berkeley)
Jianfeng Gao (Microsoft Research, Redmond, WA)

More from the Same Authors