Timezone: »
The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, based on the broad concept coverage achieved through large-scale data collection process. Alternatively, we argue that learning with external knowledge about images is a promising way which leverages a much more structured source of supervision and offers sample efficiency. In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is released at https://github.com/microsoft/klite.
Author Information
Sheng Shen (University of California Berkeley)
Chunyuan Li (Microsoft Research, Redmond)
Xiaowei Hu (University of Alberta)
Yujia Xie (Georgia Institute of Technology)
Jianwei Yang (Microsoft Research)
Pengchuan Zhang (California Institute of Technology)
Zhe Gan (Microsoft)
Lijuan Wang
Lu Yuan (Microsoft)
Ce Liu (Microsoft)
Kurt Keutzer (EECS, UC Berkeley)
Trevor Darrell (Electrical Engineering & Computer Science Department)
Anna Rohrbach (UC Berkeley)
Jianfeng Gao (Microsoft Research, Redmond, WA)
More from the Same Authors
-
2020 : Session B, Poster 4: Differentiable Top-k With Optimal Transport »
Yujia Xie -
2021 : Benchmark for Compositional Text-to-Image Synthesis »
Dong Huk Park · Samaneh Azadi · Xihui Liu · Trevor Darrell · Anna Rohrbach -
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Spotlight: ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction »
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 : Few-Shot Learning Evaluation in Natural Language Understanding »
Subhabrata Mukherjee · Xiaodong Liu · Guoqing Zheng · Saghar Hosseini · Hao Cheng · Ge Yang · Christopher Meek · Ahmed Awadallah · Jianfeng Gao -
2022 Poster: REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering »
Yuanze Lin · Yujia Xie · Dongdong Chen · Yichong Xu · Chenguang Zhu · Lu Yuan -
2022 Poster: OmniVL: One Foundation Model for Image-Language and Video-Language Tasks »
Junke Wang · Dongdong Chen · Zuxuan Wu · Chong Luo · Luowei Zhou · Yucheng Zhao · Yujia Xie · Ce Liu · Yu-Gang Jiang · Lu Yuan -
2023 Poster: Characterizing Scaling and Transfer Learning of Neural Networks for Scientific Machine Learning »
Shashank Subramanian · Peter Harrington · Kurt Keutzer · Wahid Bhimji · Dmitriy Morozov · Michael Mahoney · Amir Gholami -
2023 Poster: Visual Instruction Tuning »
Haotian Liu · Chunyuan Li · Qingyang Wu · Yong Jae Lee -
2023 Poster: Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection »
Lingchen Meng · Xiyang Dai · Jianwei Yang · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Yi-Ling Chen · Zuxuan Wu · Lu Yuan · Yu-Gang Jiang -
2023 Poster: Bridging Discrete and Backpropagation: Straight-Through and Beyond »
Liyuan Liu · Chengyu Dong · Xiaodong Liu · Bin Yu · Jianfeng Gao -
2023 Poster: Hierarchical Open-vocabulary Universal Image Segmentation »
Xudong Wang · Shufan Li · Konstantinos Kallidromitis · Yusuke Kato · Kazuki Kozuka · Trevor Darrell -
2023 Poster: Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence »
Grace Luo · Lisa Dunlap · Dong Huk Park · Aleksander Holynski · Trevor Darrell -
2023 Poster: Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models »
Shihao Zhao · Dongdong Chen · Yen-Chun Chen · Jianmin Bao · Shaozhe Hao · Lu Yuan · Kwan-Yee K. Wong -
2023 Poster: Localized Symbolic Knowledge Distillation for Visual Commonsense Models »
Jae Sung Park · Jack Hessel · Khyathi Chandu · Paul Pu Liang · Ximing Lu · Qiuyuan Huang · Peter West · Jianfeng Gao · Ali Farhadi · Yejin Choi -
2023 Poster: Guiding Large Language Models via Directional Stimulus Prompting »
Zekun Li · Baolin Peng · Pengcheng He · Michel Galley · Jianfeng Gao · Xifeng Yan -
2023 Poster: Segment Everything Everywhere All at Once »
Xueyan Zou · Jianwei Yang · Hao Zhang · Feng Li · Linjie Li · Jianfeng Wang · Lijuan Wang · Jianfeng Gao · Yong Jae Lee -
2023 Poster: Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models »
Pan Lu · Baolin Peng · Hao Cheng · Michel Galley · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Jianfeng Gao -
2023 Poster: Big Little Transformer Decoder »
Sehoon Kim · Karttikeya Mangalam · Suhong Moon · John Canny · Jitendra Malik · Michael Mahoney · Amir Gholami · Kurt Keutzer -
2023 Poster: Language Models Augmented with Decoupled Memory »
Weizhi Wang · Li Dong · Hao Cheng · Xiaodong Liu · Xifeng Yan · Jianfeng Gao · Furu Wei -
2023 Poster: Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation »
Lisa Dunlap · Alyssa Umino · Han Zhang · Jiezhi Yang · Joseph Gonzalez · Trevor Darrell -
2023 Poster: Language Models are Visual Reasoning Coordinators »
Liangyu Chen · Bo Li · Sheng Shen · Jingkang Yang · Chunyuan Li · Kurt Keutzer · Trevor Darrell · Ziwei Liu -
2023 Poster: LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day »
Chunyuan Li · Cliff Wong · Sheng Zhang · Naoto Usuyama · Haotian Liu · Jianwei Yang · Tristan Naumann · Hoifung Poon · Jianfeng Gao -
2023 Poster: NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations »
Varun Jampani · Kevis-kokitsi Maninis · Andreas Engelhardt · Arjun Karpur · Karen Truong · Kyle Sargent · Stefan Popov · Andre Araujo · Ricardo Martin Brualla · Kaushal Patel · Daniel Vlasic · Vittorio Ferrari · Ameesh Makadia · Ce Liu · Yuanzhen Li · Howard Zhou -
2023 Oral: Visual Instruction Tuning »
Haotian Liu · Chunyuan Li · Qingyang Wu · Yong Jae Lee -
2023 Oral: Bridging Discrete and Backpropagation: Straight-Through and Beyond »
Liyuan Liu · Chengyu Dong · Xiaodong Liu · Bin Yu · Jianfeng Gao -
2022 Spotlight: Focal Modulation Networks »
Jianwei Yang · Chunyuan Li · Xiyang Dai · Jianfeng Gao -
2022 Spotlight: ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models »
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao -
2022 Spotlight: OmniVL: One Foundation Model for Image-Language and Video-Language Tasks »
Junke Wang · Dongdong Chen · Zuxuan Wu · Chong Luo · Luowei Zhou · Yucheng Zhao · Yujia Xie · Ce Liu · Yu-Gang Jiang · Lu Yuan -
2022 Panel: Panel 2C-4: UViM: A Unified… & K-LITE: Learning Transferable… »
Chunyuan Li · André Susano Pinto -
2022 Spotlight: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone »
Zi-Yi Dou · Aishwarya Kamath · Zhe Gan · Pengchuan Zhang · Jianfeng Wang · Linjie Li · Zicheng Liu · Ce Liu · Yann LeCun · Nanyun Peng · Jianfeng Gao · Lijuan Wang -
2022 Poster: ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models »
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao -
2022 Poster: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models »
Dongkuan (DK) Xu · Subhabrata Mukherjee · Xiaodong Liu · Debadeepta Dey · Wenhui Wang · Xiang Zhang · Ahmed Awadallah · Jianfeng Gao -
2022 Poster: NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis »
Jian Liang · Chenfei Wu · Xiaowei Hu · Zhe Gan · Jianfeng Wang · Lijuan Wang · Zicheng Liu · Yuejian Fang · Nan Duan -
2022 Poster: A Fast Post-Training Pruning Framework for Transformers »
Woosuk Kwon · Sehoon Kim · Michael Mahoney · Joseph Hassoun · Kurt Keutzer · Amir Gholami -
2022 Poster: Squeezeformer: An Efficient Transformer for Automatic Speech Recognition »
Sehoon Kim · Amir Gholami · Albert Shaw · Nicholas Lee · Karttikeya Mangalam · Jitendra Malik · Michael Mahoney · Kurt Keutzer -
2022 Poster: Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning »
Yujia Xie · Luowei Zhou · Xiyang Dai · Lu Yuan · Nguyen Bach · Ce Liu · Michael Zeng -
2022 Poster: Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens »
Elad Ben Avraham · Roei Herzig · Karttikeya Mangalam · Amir Bar · Anna Rohrbach · Leonid Karlinsky · Trevor Darrell · Amir Globerson -
2022 Poster: Focal Modulation Networks »
Jianwei Yang · Chunyuan Li · Xiyang Dai · Jianfeng Gao -
2022 Poster: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: Visual Prompting via Image Inpainting »
Amir Bar · Yossi Gandelsman · Trevor Darrell · Amir Globerson · Alexei Efros -
2022 Poster: GLIPv2: Unifying Localization and Vision-Language Understanding »
Haotian Zhang · Pengchuan Zhang · Xiaowei Hu · Yen-Chun Chen · Liunian Li · Xiyang Dai · Lijuan Wang · Lu Yuan · Jenq-Neng Hwang · Jianfeng Gao -
2022 Poster: 3DB: A Framework for Debugging Computer Vision Models »
Guillaume Leclerc · Hadi Salman · Andrew Ilyas · Sai Vemprala · Logan Engstrom · Vibhav Vineet · Kai Xiao · Pengchuan Zhang · Shibani Santurkar · Greg Yang · Ashish Kapoor · Aleksander Madry -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 Poster: Stronger NAS with Weaker Predictors »
Junru Wu · Xiyang Dai · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Ye Yu · Zhangyang Wang · Zicheng Liu · Mei Chen · Lu Yuan -
2021 Poster: Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition »
Mark Boss · Varun Jampani · Raphael Braun · Ce Liu · Jonathan Barron · Hendrik PA Lensch -
2021 Poster: ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction »
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Poster: Chasing Sparsity in Vision Transformers: An End-to-End Exploration »
Tianlong Chen · Yu Cheng · Zhe Gan · Lu Yuan · Lei Zhang · Zhangyang Wang -
2021 Poster: CLIP-It! Language-Guided Video Summarization »
Medhini Narasimhan · Anna Rohrbach · Trevor Darrell -
2021 Poster: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer »
Ge Yang · Edward Hu · Igor Babuschkin · Szymon Sidor · Xiaodong Liu · David Farhi · Nick Ryder · Jakub Pachocki · Weizhu Chen · Jianfeng Gao -
2021 : WebQA Competition + Q&A »
Yingshan CHANG · Yonatan Bisk · Mridu Narang · Levi Melnick · Jianfeng Gao · Hisami Suzuki · Guihong Cao -
2021 Poster: Early Convolutions Help Transformers See Better »
Tete Xiao · Mannat Singh · Eric Mintun · Trevor Darrell · Piotr Dollar · Ross Girshick -
2021 Poster: Teachable Reinforcement Learning via Advice Distillation »
Olivia Watkins · Abhishek Gupta · Trevor Darrell · Pieter Abbeel · Jacob Andreas -
2020 : Poster Session B »
Ravichandra Addanki · Andreea-Ioana Deac · Yujia Xie · Francesco Landolfi · Antoine Prouvost · Claudius Gros · Renzo Massobrio · Abhishek Cauligi · Simon Alford · Hanjun Dai · Alberto Franzin · Nitish Kumar Panigrahy · Brandon Kates · Iddo Drori · Taoan Huang · Zhou Zhou · Marin Vlastelica · Anselm Paulus · Aaron Zweig · Minsu Cho · Haiyan Yin · Michal Lisicki · Nan Jiang · Haoran Sun -
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 Poster: HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks »
Zhen Dong · Zhewei Yao · Daiyaan Arfeen · Amir Gholami · Michael Mahoney · Kurt Keutzer -
2020 Poster: Differentiable Top-k with Optimal Transport »
Yujia Xie · Hanjun Dai · Minshuo Chen · Bo Dai · Tuo Zhao · Hongyuan Zha · Wei Wei · Tomas Pfister -
2020 Poster: GreedyFool: Distortion-Aware Sparse Adversarial Attack »
Xiaoyi Dong · Dongdong Chen · Jianmin Bao · Chuan Qin · Lu Yuan · Weiming Zhang · Nenghai Yu · Dong Chen -
2019 Poster: Unified Language Model Pre-training for Natural Language Understanding and Generation »
Li Dong · Nan Yang · Wenhui Wang · Furu Wei · Xiaodong Liu · Yu Wang · Jianfeng Gao · Ming Zhou · Hsiao-Wuen Hon -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael Mahoney · George Biros -
2019 Poster: Multi-source Domain Adaptation for Semantic Segmentation »
Sicheng Zhao · Bo Li · Xiangyu Yue · Yang Gu · Pengfei Xu · Runbo Hu · Hua Chai · Kurt Keutzer -
2019 Poster: Meta Learning with Relational Information for Short Sequences »
Yujia Xie · Haoming Jiang · Feng Liu · Tuo Zhao · Hongyuan Zha -
2018 : Prof. Kurt Keutzer »
Kurt Keutzer -
2018 Poster: M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search »
Yelong Shen · Jianshu Chen · Po-Sen Huang · Yuqing Guo · Jianfeng Gao -
2018 Poster: Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization »
Yizhe Zhang · Michel Galley · Jianfeng Gao · Zhe Gan · Xiujun Li · Chris Brockett · Bill Dolan -
2018 Poster: Speaker-Follower Models for Vision-and-Language Navigation »
Daniel Fried · Ronghang Hu · Volkan Cirik · Anna Rohrbach · Jacob Andreas · Louis-Philippe Morency · Taylor Berg-Kirkpatrick · Kate Saenko · Dan Klein · Trevor Darrell -
2018 Poster: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries »
Zhewei Yao · Amir Gholami · Qi Lei · Kurt Keutzer · Michael Mahoney -
2018 Poster: Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models »
Minjia Zhang · Wenhan Wang · Xiaodong Liu · Jianfeng Gao · Yuxiong He -
2017 : Invited Talk: Microsoft (Asli and Jianfeng) »
Jianfeng Gao -
2016 : Kurt Keutzer: High-Performance Deep Learning »
Kurt Keutzer -
2015 Poster: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture »
Jianshu Chen · Ji He · Yelong Shen · Lin Xiao · Xiaodong He · Jianfeng Gao · Xinying Song · Li Deng