Timezone: »
This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions and establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet.
Author Information
Xueting Li (University of California, Merced)
Sifei Liu (NVIDIA)
Shalini De Mello (NVIDIA)

Shalini De Mello is a Principal Research Scientist and Research Lead in the Learning and Perception Research group at NVIDIA, which she joined in 2013. Her research interests are in human-centric vision (face and gaze analysis) and in data-efficient (synth2real, low-shot, self-supervised and multimodal) machine learning. She has co-authored 48 peer-reviewed publications and holds 38 patents. Her inventions have contributed to several NVIDIA products, including DriveIX and Maxine. Previously, she has worked at Texas Instruments and AT&T Laboratories. She received her Doctoral degree in Electrical and Computer Engineering from the University of Texas at Austin.
Xiaolong Wang (CMU)
Jan Kautz (NVIDIA)
Ming-Hsuan Yang (Google / UC Merced)
More from the Same Authors
-
2021 Spotlight: Intriguing Properties of Vision Transformers »
Muhammad Muzammal Naseer · Kanchana Ranasinghe · Salman H Khan · Munawar Hayat · Fahad Shahbaz Khan · Ming-Hsuan Yang -
2021 : Physics Informed RNN-DCT Networks for Time-Dependent Partial Differential Equations »
Benjamin Wu · Oliver Hennigh · Jan Kautz · Sanjay Choudhry · Wonmin Byeon -
2023 Poster: ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections »
Chun-Han Yao · Amit Raj · Wei-Chih Hung · Michael Rubinstein · Yuanzhen Li · Ming-Hsuan Yang · Varun Jampani -
2023 Poster: A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence »
Junyi Zhang · Charles Herrmann · Junhwa Hur · Luisa Polania Cabrera · Varun Jampani · Deqing Sun · Ming-Hsuan Yang -
2023 Poster: AIMS: All-Inclusive Multi-Level Segmentation »
Lu Qi · Jason Kuen · Weidong Guo · Jiuxiang Gu · Zhe Lin · Bo Du · Yu Xu · Ming-Hsuan Yang -
2023 Poster: Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection »
Cheng-Ju Ho · Chen-Hsuan Tai · Yen-Yu Lin · Ming-Hsuan Yang · Yi-Hsuan Tsai -
2023 Poster: Module-wise Adaptive Distillation for Multimodality Foundation Models »
Chen Liang · Jiahui Yu · Ming-Hsuan Yang · Matthew Brown · Yin Cui · Tuo Zhao · Boqing Gong · Tianyi Zhou -
2023 Poster: SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs »
Lijun Yu · Yong Cheng · Zhiruo Wang · Vivek Kumar · Wolfgang Macherey · Yanping Huang · David Ross · Irfan Essa · Yonatan Bisk · Ming-Hsuan Yang · Kevin Murphy · Alexander Hauptmann · Lu Jiang -
2023 Poster: Generalizable One-shot Neural Head Avatar »
Xueting Li · Shalini De Mello · Sifei Liu · Koki Nagano · Umar Iqbal · Jan Kautz -
2023 Poster: Convolutional State Space Models for Long-Range Spatiotemporal Modeling »
Jimmy Smith · Shalini De Mello · Jan Kautz · Scott Linderman · Wonmin Byeon -
2023 Poster: Video Timeline Modeling For News Story Understanding »
Meng Liu · Mingda Zhang · Jialu Liu · Hanjun Dai · Ming-Hsuan Yang · Shuiwang Ji · Zheyun Feng · Boqing Gong -
2022 : Exploiting Human Interactions to Learn Human Attention »
Shalini De Mello -
2022 Poster: LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery »
Chun-Han Yao · Wei-Chih Hung · Yuanzhen Li · Michael Rubinstein · Ming-Hsuan Yang · Varun Jampani -
2021 Poster: Intriguing Properties of Vision Transformers »
Muhammad Muzammal Naseer · Kanchana Ranasinghe · Salman H Khan · Munawar Hayat · Fahad Shahbaz Khan · Ming-Hsuan Yang -
2021 Poster: Learning 3D Dense Correspondence via Canonical Point Autoencoder »
An-Chieh Cheng · Xueting Li · Min Sun · Ming-Hsuan Yang · Sifei Liu -
2021 Poster: A Contrastive Learning Approach for Training Variational Autoencoder Priors »
Jyoti Aneja · Alex Schwing · Jan Kautz · Arash Vahdat -
2021 Poster: Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing »
Yan-Bo Lin · Hung-Yu Tseng · Hsin-Ying Lee · Yen-Yu Lin · Ming-Hsuan Yang -
2021 Poster: Score-based Generative Modeling in Latent Space »
Arash Vahdat · Karsten Kreis · Jan Kautz -
2021 Poster: Coupled Segmentation and Edge Learning via Dynamic Graph Propagation »
Zhiding Yu · Rui Huang · Wonmin Byeon · Sifei Liu · Guilin Liu · Thomas Breuel · Anima Anandkumar · Jan Kautz -
2021 Poster: End-to-end Multi-modal Video Temporal Grounding »
Yi-Wen Chen · Yi-Hsuan Tsai · Ming-Hsuan Yang -
2020 Poster: NVAE: A Deep Hierarchical Variational Autoencoder »
Arash Vahdat · Jan Kautz -
2020 Spotlight: NVAE: A Deep Hierarchical Variational Autoencoder »
Arash Vahdat · Jan Kautz -
2020 Poster: Online Adaptation for Consistent Mesh Reconstruction in the Wild »
Xueting Li · Sifei Liu · Shalini De Mello · Kihwan Kim · Xiaolong Wang · Ming-Hsuan Yang · Jan Kautz -
2020 Poster: Convolutional Tensor-Train LSTM for Spatio-Temporal Learning »
Jiahao Su · Wonmin Byeon · Jean Kossaifi · Furong Huang · Jan Kautz · Anima Anandkumar -
2020 Poster: Self-Learning Transformations for Improving Gaze and Head Redirection »
Yufeng Zheng · Seonwook Park · Xucong Zhang · Shalini De Mello · Otmar Hilliges -
2019 Poster: Quadratic Video Interpolation »
Xiangyu Xu · Li Siyao · Wenxiu Sun · Qian Yin · Ming-Hsuan Yang -
2019 Spotlight: Quadratic Video Interpolation »
Xiangyu Xu · Li Siyao · Wenxiu Sun · Qian Yin · Ming-Hsuan Yang -
2019 Poster: Few-shot Video-to-Video Synthesis »
Ting-Chun Wang · Ming-Yu Liu · Andrew Tao · Guilin Liu · Bryan Catanzaro · Jan Kautz -
2019 Poster: Dancing to Music »
Hsin-Ying Lee · Xiaodong Yang · Ming-Yu Liu · Ting-Chun Wang · Yu-Ding Lu · Ming-Hsuan Yang · Jan Kautz -
2018 : Jan Kautz »
Jan Kautz -
2018 Poster: Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation »
Wenqi Ren · Jiawei Zhang · Lin Ma · Jinshan Pan · Xiaochun Cao · Wangmeng Zuo · Wei Liu · Ming-Hsuan Yang -
2018 Poster: Context-aware Synthesis and Placement of Object Instances »
Donghoon Lee · Sifei Liu · Jinwei Gu · Ming-Yu Liu · Ming-Hsuan Yang · Jan Kautz -
2018 Poster: Video-to-Video Synthesis »
Ting-Chun Wang · Ming-Yu Liu · Jun-Yan Zhu · Guilin Liu · Andrew Tao · Jan Kautz · Bryan Catanzaro -
2018 Poster: Deep Attentive Tracking via Reciprocative Learning »
Shi Pu · YIBING SONG · Chao Ma · Honggang Zhang · Ming-Hsuan Yang -
2017 : Poster Session (encompasses coffee break) »
Beidi Chen · Borja Balle · Daniel Lee · iuri frosio · Jitendra Malik · Jan Kautz · Ke Li · Masashi Sugiyama · Miguel A. Carreira-Perpinan · Ramin Raziperchikolaei · Theja Tulabandhula · Yung-Kyun Noh · Adams Wei Yu -
2017 Poster: Unsupervised Image-to-Image Translation Networks »
Ming-Yu Liu · Thomas Breuel · Jan Kautz -
2017 Spotlight: Unsupervised Image-to-Image Translation Networks »
Ming-Yu Liu · Thomas Breuel · Jan Kautz -
2017 Poster: Learning Affinity via Spatial Propagation Networks »
Sifei Liu · Shalini De Mello · Jinwei Gu · Guangyu Zhong · Ming-Hsuan Yang · Jan Kautz -
2017 Poster: Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks »
Wei-Sheng Lai · Jia-Bin Huang · Ming-Hsuan Yang -
2017 Poster: Universal Style Transfer via Feature Transforms »
Yijun Li · Chen Fang · Jimei Yang · Zhaowen Wang · Xin Lu · Ming-Hsuan Yang -
2015 Poster: Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis »
Jimei Yang · Scott E Reed · Ming-Hsuan Yang · Honglak Lee