Timezone: »
Most existing temporal action localization (TAL) methods rely on a transfer learning pipeline: by first optimizing a video encoder on a large action classification dataset (i.e., source domain), followed by freezing the encoder and training a TAL head on the action localization dataset (i.e., target domain). This results in a task discrepancy problem for the video encoder – trained for action classification, but used for TAL. Intuitively, joint optimization with both the video encoder and TAL head is a strong baseline solution to this discrepancy. However, this is not operable for TAL subject to the GPU memory constraints, due to the prohibitive computational cost in processing long untrimmed videos. In this paper, we resolve this challenge by introducing a novel low-fidelity (LoFi) video encoder optimization method. Instead of always using the full training configurations in TAL learning, we propose to reduce the mini-batch composition in terms of temporal, spatial, or spatio-temporal resolution so that jointly optimizing the video encoder and TAL head becomes operable under the same memory conditions of a mid-range hardware budget. Crucially, this enables the gradients to flow backwards through the video encoder conditioned on a TAL supervision loss, favourably solving the task discrepancy problem and providing more effective feature representations. Extensive experiments show that the proposed LoFi optimization approach can significantly enhance the performance of existing TAL methods. Encouragingly, even with a lightweight ResNet18 based video encoder in a single RGB stream, our method surpasses two-stream (RGB + optical-flow) ResNet50 based alternatives, often by a good margin. Our code is publicly available at https://github.com/saic-fi/lofiactionlocalization.
Author Information
Mengmeng Xu (KAUST)
Juan Manuel Perez Rua (Meta AI)
Xiatian Zhu (Samsung AI Centre, Cambridge)
Bernard Ghanem (KAUST)
Brais Martinez (Samsung AI Center)
More from the Same Authors
-
2021 Spotlight: ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning »
Guocheng Qian · Hasan Hammoud · Guohao Li · Ali Thabet · Bernard Ghanem -
2021 Spotlight: SOFT: Softmax-free Transformer with Linear Complexity »
Jiachen Lu · Jinghan Yao · Junge Zhang · Xiatian Zhu · Hang Xu · Weiguo Gao · Chunjing XU · Tao Xiang · Li Zhang -
2022 Poster: MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification »
Zhenbin Wang · Mao Ye · Xiatian Zhu · Liuhan Peng · Liang Tian · Yingying Zhu -
2022 : Certified Robustness in Federated Learning »
Motasem Alfarra · Juan Perez · Egor Shulgin · Peter Richtarik · Bernard Ghanem -
2023 Poster: Dynamically Masked Discriminator for GANs »
Wentian Zhang · Haozhe Liu · Bing Li · Jinheng Xie · Yawen Huang · Yuexiang Li · Yefeng Zheng · Bernard Ghanem -
2023 Poster: CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society »
Guohao Li · Hasan Hammoud · Hani Itani · Dmitrii Khizbullin · Bernard Ghanem -
2023 Poster: HeadSculpt: Crafting 3D Head Avatars with Text »
Xiao Han · Yukang Cao · Kai Han · Xiatian Zhu · Jiankang Deng · Yi-Zhe Song · Tao Xiang · Kwan-Yee K. Wong -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification »
Zhenbin Wang · Mao Ye · Xiatian Zhu · Liuhan Peng · Liang Tian · Yingying Zhu -
2022 Spotlight: Egocentric Video-Language Pretraining »
Kevin Qinghong Lin · Jinpeng Wang · Mattia Soldan · Michael Wray · Rui Yan · Eric Z. XU · Difei Gao · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: DeepInteraction: 3D Object Detection via Modality Interaction »
Zeyu Yang · Jiaqi Chen · Zhenwei Miao · Wei Li · Xiatian Zhu · Li Zhang -
2022 Spotlight: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Poster: PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies »
Guocheng Qian · Yuchen Li · Houwen Peng · Jinjie Mai · Hasan Hammoud · Mohamed Elhoseiny · Bernard Ghanem -
2022 Poster: Egocentric Video-Language Pretraining »
Kevin Qinghong Lin · Jinpeng Wang · Mattia Soldan · Michael Wray · Rui Yan · Eric Z. XU · Difei Gao · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Poster: DeepInteraction: 3D Object Detection via Modality Interaction »
Zeyu Yang · Jiaqi Chen · Zhenwei Miao · Wei Li · Xiatian Zhu · Li Zhang -
2022 Poster: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2021 Poster: Space-time Mixing Attention for Video Transformer »
Adrian Bulat · Juan Manuel Perez Rua · Swathikiran Sudhakaran · Brais Martinez · Georgios Tzimiropoulos -
2021 Poster: ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning »
Guocheng Qian · Hasan Hammoud · Guohao Li · Ali Thabet · Bernard Ghanem -
2021 Poster: SOFT: Softmax-free Transformer with Linear Complexity »
Jiachen Lu · Jinghan Yao · Junge Zhang · Xiatian Zhu · Hang Xu · Weiguo Gao · Chunjing XU · Tao Xiang · Li Zhang -
2020 Poster: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran -
2020 Spotlight: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran