Timezone: »
Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, ST-Adapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small ~8% per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency.
Author Information
Junting Pan (The Chinese University of Hong Kong)
Ziyi Lin (The Chinese University of Hong Kong)
Xiatian Zhu (University of Surrey)
Jing Shao (Sensetime)
Hongsheng Li (The Chinese University of Hong Kong)
More from the Same Authors
-
2022 Poster: MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification »
Zhenbin Wang · Mao Ye · Xiatian Zhu · Liuhan Peng · Liang Tian · Yingying Zhu -
2023 Poster: Context-TAP: Tacking Any Point Demands Context Features »
Weikang BIAN · Zhaoyang Huang · Xiaoyu Shi · Yitong Dong · Yijin Li · Hongsheng Li -
2023 Poster: A Unified Conditional Framework for Diffusion-based Image Restoration »
Yi Zhang · Xiaoyu Shi · Dasong Li · Xiaogang Wang · Jian Wang · Hongsheng Li -
2023 Poster: HeadSculpt: Crafting 3D Head Avatars with Text »
Xiao Han · Yukang Cao · Kai Han · Xiatian Zhu · Jiankang Deng · Yi-Zhe Song · Tao Xiang · Kwan-Yee K. Wong -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification »
Zhenbin Wang · Mao Ye · Xiatian Zhu · Liuhan Peng · Liang Tian · Yingying Zhu -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: DeepInteraction: 3D Object Detection via Modality Interaction »
Zeyu Yang · Jiaqi Chen · Zhenwei Miao · Wei Li · Xiatian Zhu · Li Zhang -
2022 Spotlight: Lightning Talks 4B-3 »
Zicheng Zhang · Mancheng Meng · Antoine Guedon · Yue Wu · Wei Mao · Zaiyu Huang · Peihao Chen · Shizhe Chen · Yongwei Chen · Keqiang Sun · Yi Zhu · chen rui · Hanhui Li · Dongyu Ji · Ziyan Wu · miaomiao Liu · Pascal Monasse · Yu Deng · Shangzhe Wu · Pierre-Louis Guhur · Jiaolong Yang · Kunyang Lin · Makarand Tapaswi · Zhaoyang Huang · Terrence Chen · Jiabao Lei · Jianzhuang Liu · Vincent Lepetit · Zhenyu Xie · Richard I Hartley · Dinggang Shen · Xiaodan Liang · Runhao Zeng · Cordelia Schmid · Michael Kampffmeyer · Mathieu Salzmann · Ning Zhang · Fangyun Wei · Yabin Zhang · Fan Yang · Qifeng Chen · Wei Ke · Quan Wang · Thomas Li · qingling Cai · Kui Jia · Ivan Laptev · Mingkui Tan · Xin Tong · Hongsheng Li · Xiaodan Liang · Chuang Gan -
2022 Spotlight: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Spotlight: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields »
Keqiang Sun · Shangzhe Wu · Zhaoyang Huang · Ning Zhang · Quan Wang · Hongsheng Li -
2022 Spotlight: Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs »
Jinguo Zhu · Xizhou Zhu · Wenhai Wang · Xiaohua Wang · Hongsheng Li · Xiaogang Wang · Jifeng Dai -
2022 Spotlight: MCMAE: Masked Convolution Meets Masked Autoencoders »
Peng Gao · Teli Ma · Hongsheng Li · Ziyi Lin · Jifeng Dai · Yu Qiao -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training »
Renrui Zhang · Ziyu Guo · Peng Gao · Rongyao Fang · Bin Zhao · Dong Wang · Yu Qiao · Hongsheng Li -
2022 Poster: Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs »
Jinguo Zhu · Xizhou Zhu · Wenhai Wang · Xiaohua Wang · Hongsheng Li · Xiaogang Wang · Jifeng Dai -
2022 Poster: MCMAE: Masked Convolution Meets Masked Autoencoders »
Peng Gao · Teli Ma · Hongsheng Li · Ziyi Lin · Jifeng Dai · Yu Qiao -
2022 Poster: DeepInteraction: 3D Object Detection via Modality Interaction »
Zeyu Yang · Jiaqi Chen · Zhenwei Miao · Wei Li · Xiatian Zhu · Li Zhang -
2022 Poster: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields »
Keqiang Sun · Shangzhe Wu · Zhaoyang Huang · Ning Zhang · Quan Wang · Hongsheng Li -
2021 Poster: DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks »
Wei Sun · Aojun Zhou · Sander Stuijk · Rob Wijnhoven · Andrew Nelson · Hongsheng Li · Henk Corporaal -
2021 Poster: Container: Context Aggregation Networks »
peng gao · Jiasen Lu · Hongsheng Li · Roozbeh Mottaghi · Aniruddha Kembhavi -
2020 Poster: Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID »
Yixiao Ge · Feng Zhu · Dapeng Chen · Rui Zhao · Hongsheng Li -
2020 Poster: Balanced Meta-Softmax for Long-Tailed Visual Recognition »
Jiawei Ren · Cunjun Yu · shunan sheng · Xiao Ma · Haiyu Zhao · Shuai Yi · Hongsheng Li -
2019 Poster: Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis »
Xihui Liu · Guojun Yin · Jing Shao · Xiaogang Wang · Hongsheng Li -
2018 Poster: FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification »
Yixiao Ge · Zhuowan Li · Haiyu Zhao · Guojun Yin · Shuai Yi · Xiaogang Wang · Hongsheng Li -
2016 Poster: CRF-CNN: Modeling Structured Information in Human Pose Estimation »
Xiao Chu · Wanli Ouyang · Hongsheng Li · Xiaogang Wang