Timezone: »

Compressed Video Contrastive Learning
Yuqi Huo · Mingyu Ding · Haoyu Lu · Nanyi Fei · Zhiwu Lu · Ji-Rong Wen · Ping Luo

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

This work concerns self-supervised video representation learning (SSVRL), one topic that has received much attention recently. Since videos are storage-intensive and contain a rich source of visual content, models designed for SSVRL are expected to be storage- and computation-efficient, as well as effective. However, most existing methods only focus on one of the two objectives, failing to consider both at the same time. In this work, for the first time, the seemingly contradictory goals are simultaneously achieved by exploiting compressed videos and capturing mutual information between two input streams. Specifically, a novel Motion Vector based Cross Guidance Contrastive learning approach (MVCGC) is proposed. For storage and computation efficiency, we choose to directly decode RGB frames and motion vectors (that resemble low-resolution optical flows) from compressed videos on-the-fly. To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa. Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being significantly more efficient than its competitors.

Author Information

Yuqi Huo (Renmin University of China)
Mingyu Ding (The University of Hong Kong)
Haoyu Lu (Renmin University of China)
Nanyi Fei (Renmin University of China)
Zhiwu Lu (Renmin University of China)
Ji-Rong Wen (Renmin University of China)
Ping Luo (The University of Hong Kong)

More from the Same Authors