Timezone: »

 
Poster
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #1001

Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets/tasks. However, it remains challenging to evaluate the transferablity of these foundation models due to the lack of easy-to-use toolkits for fair benchmarking. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark to compare and evaluate pre-trained language-augmented visual models. Several highlights include: (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to ensure the fairness in model adaption. To leverage the full power of language-augmented visual models, novel language-aware initialization methods are proposed to significantly improve the adaption performance. (iii) Metrics. A variety of evaluation metrics are used, including sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). We will publicly release ELEVATER.

Author Information

Chunyuan Li (Microsoft Research, Redmond)
Haotian Liu (UW Madison)
Liunian Li (University of California, Los Angeles)
Pengchuan Zhang (California Institute of Technology)
Jyoti Aneja (University of Illinois, Urbana Champaign)

I am a graduate student at UIUC working in image captioning using GANs

Jianwei Yang (Microsoft Research)
Ping Jin (Microsoft)
Houdong Hu (University of California, San Diego)
Zicheng Liu (Microsoft)
Yong Jae Lee (Department of Computer Sciences, University of Wisconsin-Madison)
Jianfeng Gao (Microsoft Research, Redmond, WA)

More from the Same Authors