Timezone: »
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create EgoClip, a 1st-person video-text pretraining dataset comprising 3.8M clip-text pairs well-chosen from Ego4D, covering a large variety of human daily activities. (ii) We propose a novel pretraining objective, dubbed EgoNCE, which adapts video-text contrastive learning to the egocentric domain by mining egocentric-aware positive and negative samples. (iii) We introduce EgoMCQ, a development benchmark that is close to EgoClip and hence can support effective validation and fast exploration of our design decisions in EgoClip and EgoNCE. Furthermore, we demonstrate strong performance on five egocentric downstream tasks across three datasets: video-text retrieval on EPIC-KITCHENS-100; action recognition on Charades-Ego; natural language query, moment query, and object state change classification on Ego4D challenge benchmarks. The dataset and code are available at https://github.com/showlab/EgoVLP.
Author Information
Kevin Qinghong Lin (National University of Singapore)

I am currently a first-year Ph.D. student in Show Lab @ NUS, working with Prof. Mike Shou. Before that, I spend a wonderful year at Tencent as an intern, working with Dr. Wei Liu. I obtained my B.Sc and M.Sc degree in Shenzhen University. My research interests lie in Multi-Modal Learning, especially Vision-Language Pretraining.
Jinpeng Wang (SUN YAT-SEN UNIVERSITY)
Mattia Soldan (KAUST)
Michael Wray (University of Bristol)
Rui Yan (Nanjing University of Science and Technology)
Eric Z. XU (National University of Singapore)
Difei Gao (NUS)
Rong-Cheng Tu (Beijing Institute of Technology)
Wenzhe Zhao (South China University of Technology)
Weijie Kong (Peking University)
Chengfei Cai (Zhejiang University)
WANG HongFa (Chinese Academy of Sciences)
Dima Damen (University of Bristol)

Professor of Computer Vision at the University of Bristol.
Bernard Ghanem (KAUST)
Wei Liu (Tencent)
Mike Zheng Shou (National University of Singapore)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Egocentric Video-Language Pretraining »
Thu. Dec 1st 05:00 -- 07:00 PM Room Hall J #626
More from the Same Authors
-
2021 Spotlight: ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning »
Guocheng Qian · Hasan Hammoud · Guohao Li · Ali Thabet · Bernard Ghanem -
2022 : Certified Robustness in Federated Learning »
Motasem Alfarra · Juan Perez · Egor Shulgin · Peter Richtarik · Bernard Ghanem -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Poster: PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies »
Guocheng Qian · Yuchen Li · Houwen Peng · Jinjie Mai · Hasan Hammoud · Mohamed Elhoseiny · Bernard Ghanem -
2022 Poster: EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations »
Ahmad Darkhalil · Dandan Shan · Bin Zhu · Jian Ma · Amlan Kar · Richard Higgins · Sanja Fidler · David Fouhey · Dima Damen -
2022 Poster: DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes »
Jia-Wei Liu · Yan-Pei Cao · Weijia Mao · Wenqiao Zhang · David Junhao Zhang · Jussi Keppo · Ying Shan · Xiaohu Qie · Mike Zheng Shou -
2021 : Invited Talk - Dima Damen »
Dima Damen -
2021 Poster: ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning »
Guocheng Qian · Hasan Hammoud · Guohao Li · Ali Thabet · Bernard Ghanem -
2021 Poster: Low-Fidelity Video Encoder Optimization for Temporal Action Localization »
Mengmeng Xu · Juan Manuel Perez Rua · Xiatian Zhu · Bernard Ghanem · Brais Martinez -
2020 Poster: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran -
2020 Spotlight: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran