Timezone: »
Frozen pretrained models have become a viable alternative to the pretraining-then-finetuning paradigm for transfer learning. However, with frozen models there are relatively few parameters available for adapting to downstream tasks, which is problematic in computer vision where tasks vary significantly in input/output format and the type of information that is of value. In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition. From this empirical analysis, our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes. We additionally examine the upper bound of performance using a giant frozen pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches competitive performance on a varied set of major benchmarks with only one shared frozen base network: 60.0 box mAP and 52.2 mask mAP on COCO object detection test-dev, 57.6 val mIoU on ADE20K semantic segmentation, and 81.7 top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to bring greater attention to this promising path of freezing pretrained image models.
Author Information
Yutong Lin (Xi'an Jiaotong University)
Ze Liu (University of Science and Technology of China)
Zheng Zhang (MSRA)
Han Hu (Microsoft Research Asia)
Nanning Zheng (Xi'an Jiaotong University)
Stephen Lin (Microsoft Research)
Yue Cao (Microsoft Research)
More from the Same Authors
-
2020 : Paper 62: Instance-wise Depth and Motion Learning from Monocular Videos »
Seokju Lee · Sunghoon Im · Stephen Lin · In So Kweon -
2021 Spotlight: Aligning Pretraining for Detection via Object-Level Contrastive Learning »
Fangyun Wei · Yue Gao · Zhirong Wu · Han Hu · Stephen Lin -
2021 Spotlight: Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning »
Hanzhe Hu · Fangyun Wei · Han Hu · Qiwei Ye · Jinshi Cui · Liwei Wang -
2021 Spotlight: Bootstrap Your Object Detector via Mixed Training »
Mengde Xu · Zheng Zhang · Fangyun Wei · Yutong Lin · Yue Cao · Stephen Lin · Han Hu · Xiang Bai -
2022 : All are Worth Words: a ViT Backbone for Score-based Diffusion Models »
Fan Bao · Chongxuan LI · Yue Cao · Jun Zhu -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning »
Weicong Liang · YUHUI YUAN · Henghui Ding · Xiao Luo · Weihong Lin · Ding Jia · Zheng Zhang · Chao Zhang · Han Hu -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: Visual Concepts Tokenization »
Tao Yang · Yuwang Wang · Yan Lu · Nanning Zheng -
2022 Spotlight: Lightning Talks 2A-3 »
David Buterez · Chengan He · Xuan Kan · Yutong Lin · Konstantin Schürholt · Yu Yang · Louis Annabi · Wei Dai · Xiaotian Cheng · Alexandre Pitti · Ze Liu · Jon Paul Janet · Jun Saito · Boris Knyazev · Mathias Quoy · Zheng Zhang · James Zachary · Steven J Kiddle · Xavier Giro-i-Nieto · Chang Liu · Hejie Cui · Zilong Zhang · Hakan Bilen · Damian Borth · Dino Oglic · Holly Rushmeier · Han Hu · Xiangyang Ji · Yi Zhou · Nanning Zheng · Ying Guo · Pietro Liò · Stephen Lin · Carl Yang · Yue Cao -
2022 Spotlight: Could Giant Pre-trained Image Models Extract Universal Representations? »
Yutong Lin · Ze Liu · Zheng Zhang · Han Hu · Nanning Zheng · Stephen Lin · Yue Cao -
2022 Poster: Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning »
Weicong Liang · YUHUI YUAN · Henghui Ding · Xiao Luo · Weihong Lin · Ding Jia · Zheng Zhang · Chao Zhang · Han Hu -
2022 Poster: Visual Concepts Tokenization »
Tao Yang · Yuwang Wang · Yan Lu · Nanning Zheng -
2021 Poster: Co-evolution Transformer for Protein Contact Prediction »
He Zhang · Fusong Ju · Jianwei Zhu · Liang He · Bin Shao · Nanning Zheng · Tie-Yan Liu -
2021 Poster: Dynamic Grained Encoder for Vision Transformers »
Lin Song · Songyang Zhang · Songtao Liu · Zeming Li · Xuming He · Hongbin Sun · Jian Sun · Nanning Zheng -
2021 Poster: Instance-Conditional Knowledge Distillation for Object Detection »
Zijian Kang · Peizhen Zhang · Xiangyu Zhang · Jian Sun · Nanning Zheng -
2021 Poster: The Emergence of Objectness: Learning Zero-shot Segmentation from Videos »
Runtao Liu · Zhirong Wu · Stella Yu · Stephen Lin -
2021 Poster: Aligning Pretraining for Detection via Object-Level Contrastive Learning »
Fangyun Wei · Yue Gao · Zhirong Wu · Han Hu · Stephen Lin -
2021 Poster: Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning »
Hanzhe Hu · Fangyun Wei · Han Hu · Qiwei Ye · Jinshi Cui · Liwei Wang -
2021 Poster: Bootstrap Your Object Detector via Mixed Training »
Mengde Xu · Zheng Zhang · Fangyun Wei · Yutong Lin · Yue Cao · Stephen Lin · Han Hu · Xiang Bai -
2020 Poster: RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder »
Cheng Chi · Fangyun Wei · Han Hu -
2020 Spotlight: RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder »
Cheng Chi · Fangyun Wei · Han Hu -
2020 Poster: Compositional Generalization by Learning Analytical Expressions »
Qian Liu · Shengnan An · Jian-Guang Lou · Bei Chen · Zeqi Lin · Yan Gao · Bin Zhou · Nanning Zheng · Dongmei Zhang -
2020 Spotlight: Compositional Generalization by Learning Analytical Expressions »
Qian Liu · Shengnan An · Jian-Guang Lou · Bei Chen · Zeqi Lin · Yan Gao · Bin Zhou · Nanning Zheng · Dongmei Zhang -
2020 Poster: Rethinking Learnable Tree Filter for Generic Feature Transform »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Xiangyu Zhang · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: Fine-Grained Dynamic Head for Object Detection »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: RepPoints v2: Verification Meets Regression for Object Detection »
Yihong Chen · Zheng Zhang · Yue Cao · Liwei Wang · Stephen Lin · Han Hu -
2020 Poster: Parametric Instance Classification for Unsupervised Visual Feature learning »
Yue Cao · Zhenda Xie · Bin Liu · Yutong Lin · Zheng Zhang · Han Hu -
2019 Poster: Learnable Tree Filter for Structure-preserving Feature Transform »
Lin Song · Yanwei Li · Zeming Li · Gang Yu · Hongbin Sun · Jian Sun · Nanning Zheng -
2018 Poster: Recurrent Transformer Networks for Semantic Correspondence »
Seungryong Kim · Stephen Lin · Sangryul Jeon · Dongbo Min · Kwanghoon Sohn -
2018 Spotlight: Recurrent Transformer Networks for Semantic Correspondence »
Seungryong Kim · Stephen Lin · Sangryul Jeon · Dongbo Min · Kwanghoon Sohn