Timezone: »
Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modalities can be semantically misaligned even they are temporally aligned. For example, even in the (often adopted) instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos collected from unconstrained internet sources. We conjecture that might cause conflicts and biases among modalities, and may hence prohibit CMA from scaling up to training with larger and more heterogeneous data. This paper first verifies our conjecture by observing that, even in the latest VATT pre-training using only narrated videos, there exist strong gradient conflicts between different CMA losses within the same sample triplet (video, audio, text), indicating them as the noisy source of supervision. We then propose to harmonize such gradients during pre-training, via two techniques: (i) cross-modality gradient realignment: modifying different CMA loss gradients for one sample triplet, so that their gradient directions are in more agreement; and (ii) gradient-based curriculum learning: leveraging the gradient conflict information on an indicator of sample noisiness, to develop a curriculum learning strategy to prioritize training with less noisy sample triplets. Applying those gradient harmonization techniques to pre-training VATT on the HowTo100M dataset, we consistently improve its performance on different downstream tasks. Moreover, we are able to scale VATT pre-training to more complicated non-narrative Youtube8M dataset to further improve the state-of-the-arts.
Author Information
Junru Wu (Texas A&M University)
Yi Liang (Research, Google)
feng han (google)
Hassan Akbari (Google)
Zhangyang Wang (University of Texas at Austin)
Cong Yu (Google Research)
More from the Same Authors
-
2022 : HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing »
Tianlong Chen · Chengyue Gong · Daniel Diaz · Xuxi Chen · Jordan Wells · Qiang Liu · Zhangyang Wang · Andrew Ellington · Alex Dimakis · Adam Klivans -
2022 Spotlight: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang Wang -
2022 Poster: Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets »
Ruisi Cai · Zhenyu Zhang · Tianlong Chen · Xiaohan Chen · Zhangyang Wang -
2022 Poster: Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative »
Tianxin Wei · Yuning You · Tianlong Chen · Yang Shen · Jingrui He · Zhangyang Wang -
2022 Poster: Signal Processing for Implicit Neural Representations »
Dejia Xu · Peihao Wang · Yifan Jiang · Zhiwen Fan · Zhangyang Wang -
2022 Poster: Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation »
Ziyu Jiang · Xuxi Chen · Xueqin Huang · Xianzhi Du · Denny Zhou · Zhangyang Wang -
2022 Poster: Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork »
Haotao Wang · Junyuan Hong · Aston Zhang · Jiayu Zhou · Zhangyang Wang -
2022 Poster: Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis »
Wuyang Chen · Wei Huang · Xinyu Gong · Boris Hanin · Zhangyang Wang -
2022 Poster: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang Wang -
2022 Poster: Symbolic Distillation for Learned TCP Congestion Control »
S P Sharan · Wenqing Zheng · Kuo-Feng Hsu · Jiarong Xing · Ang Chen · Zhangyang Wang -
2022 Poster: M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design »
hanxue liang · Zhiwen Fan · Rishov Sarkar · Ziyu Jiang · Tianlong Chen · Kai Zou · Yu Cheng · Cong Hao · Zhangyang Wang -
2022 Poster: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again »
AJAY JAISWAL · Peihao Wang · Tianlong Chen · Justin Rousseau · Ying Ding · Zhangyang Wang -
2022 Poster: A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking »
Keyu Duan · Zirui Liu · Peihao Wang · Wenqing Zheng · Kaixiong Zhou · Tianlong Chen · Xia Hu · Zhangyang Wang -
2021 Poster: Stronger NAS with Weaker Predictors »
Junru Wu · Xiyang Dai · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Ye Yu · Zhangyang Wang · Zicheng Liu · Mei Chen · Lu Yuan -
2021 Poster: VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text »
Hassan Akbari · Liangzhe Yuan · Rui Qian · Wei-Hong Chuang · Shih-Fu Chang · Yin Cui · Boqing Gong -
2020 Workshop: Second Workshop on AI for Humanitarian Assistance and Disaster Response »
Ritwik Gupta · Robin Murphy · Eric Heim · Zhangyang Wang · Bryce Goodman · Nirav Patel · Piotr Bilinski · Edoardo Nemni -
2020 Poster: Graph Contrastive Learning with Augmentations »
Yuning You · Tianlong Chen · Yongduo Sui · Ting Chen · Zhangyang Wang · Yang Shen -
2020 Poster: MATE: Plugging in Model Awareness to Task Embedding for Meta Learning »
Xiaohan Chen · Zhangyang Wang · Siyu Tang · Krikamol Muandet -
2020 Poster: Robust Pre-Training by Adversarial Contrastive Learning »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2020 Poster: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Spotlight: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Poster: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free »
Haotao Wang · Tianlong Chen · Shupeng Gui · TingKuei Hu · Ji Liu · Zhangyang Wang -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang Wang · Yingyan Lin -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang Wang · Michael Carbin -
2020 Poster: ShiftAddNet: A Hardware-Inspired Deep Network »
Haoran You · Xiaohan Chen · Yongan Zhang · Chaojian Li · Sicheng Li · Zihao Liu · Zhangyang Wang · Yingyan Lin