Timezone: »
Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modalities can be semantically misaligned even they are temporally aligned. For example, even in the (often adopted) instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos collected from unconstrained internet sources. We conjecture that might cause conflicts and biases among modalities, and may hence prohibit CMA from scaling up to training with larger and more heterogeneous data. This paper first verifies our conjecture by observing that, even in the latest VATT pre-training using only narrated videos, there exist strong gradient conflicts between different CMA losses within the same sample triplet (video, audio, text), indicating them as the noisy source of supervision. We then propose to harmonize such gradients during pre-training, via two techniques: (i) cross-modality gradient realignment: modifying different CMA loss gradients for one sample triplet, so that their gradient directions are in more agreement; and (ii) gradient-based curriculum learning: leveraging the gradient conflict information on an indicator of sample noisiness, to develop a curriculum learning strategy to prioritize training with less noisy sample triplets. Applying those gradient harmonization techniques to pre-training VATT on the HowTo100M dataset, we consistently improve its performance on different downstream tasks. Moreover, we are able to scale VATT pre-training to more complicated non-narrative Youtube8M dataset to further improve the state-of-the-arts.
Author Information
Junru Wu (Texas A&M University)
Yi Liang (Research, Google)
feng han (google)
Hassan Akbari (Google)
Zhangyang "Atlas" Wang (University of Texas at Austin)
Cong Yu (Google Research)
More from the Same Authors
-
2022 : HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing »
Tianlong Chen · Chengyue Gong · Daniel Diaz · Xuxi Chen · Jordan Wells · Qiang Liu · Zhangyang "Atlas" Wang · Andrew Ellington · Alex Dimakis · Adam Klivans -
2023 Poster: Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models »
Zhendong Wang · Yifan Jiang · Huangjie Zheng · Peihao Wang · Pengcheng He · Zhangyang "Atlas" Wang · Weizhu Chen · Mingyuan Zhou -
2023 Poster: The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter »
AJAY JAISWAL · Shiwei Liu · Tianlong Chen · Zhangyang "Atlas" Wang -
2023 Poster: Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling »
Haotao Wang · Ziyu Jiang · Yuning You · Yan Han · Gaowen Liu · Jayanth Srinivasa · Ramana Kompella · Zhangyang "Atlas" Wang -
2023 Poster: Don’t just prune by magnitude! Your mask topology is a secret weapon »
Duc Hoang · Souvik Kundu · Shiwei Liu · Zhangyang "Atlas" Wang -
2023 Poster: Dynamic Sparsity Is Channel-Level Sparsity Learner »
Lu Yin · Gen Li · Meng Fang · Li Shen · Tianjin Huang · Zhangyang "Atlas" Wang · Vlado Menkovski · Xiaolong Ma · Mykola Pechenizkiy · Shiwei Liu -
2023 Poster: In-Context Learning Unlocked for Diffusion Models »
Zhendong Wang · Yifan Jiang · Yadong Lu · yelong shen · Pengcheng He · Weizhu Chen · Zhangyang "Atlas" Wang · Mingyuan Zhou -
2023 Poster: H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models »
Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Ré · Clark Barrett · Zhangyang "Atlas" Wang · Beidi Chen -
2022 Spotlight: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang "Atlas" Wang -
2022 Poster: Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets »
Ruisi Cai · Zhenyu Zhang · Tianlong Chen · Xiaohan Chen · Zhangyang "Atlas" Wang -
2022 Poster: Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative »
Tianxin Wei · Yuning You · Tianlong Chen · Yang Shen · Jingrui He · Zhangyang "Atlas" Wang -
2022 Poster: Signal Processing for Implicit Neural Representations »
Dejia Xu · Peihao Wang · Yifan Jiang · Zhiwen Fan · Zhangyang "Atlas" Wang -
2022 Poster: Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation »
Ziyu Jiang · Xuxi Chen · Xueqin Huang · Xianzhi Du · Denny Zhou · Zhangyang "Atlas" Wang -
2022 Poster: Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork »
Haotao Wang · Junyuan Hong · Aston Zhang · Jiayu Zhou · Zhangyang "Atlas" Wang -
2022 Poster: Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis »
Wuyang Chen · Wei Huang · Xinyu Gong · Boris Hanin · Zhangyang "Atlas" Wang -
2022 Poster: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang "Atlas" Wang -
2022 Poster: Symbolic Distillation for Learned TCP Congestion Control »
S P Sharan · Wenqing Zheng · Kuo-Feng Hsu · Jiarong Xing · Ang Chen · Zhangyang "Atlas" Wang -
2022 Poster: M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design »
hanxue liang · Zhiwen Fan · Rishov Sarkar · Ziyu Jiang · Tianlong Chen · Kai Zou · Yu Cheng · Cong Hao · Zhangyang "Atlas" Wang -
2022 Poster: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again »
AJAY JAISWAL · Peihao Wang · Tianlong Chen · Justin Rousseau · Ying Ding · Zhangyang "Atlas" Wang -
2022 Poster: A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking »
Keyu Duan · Zirui Liu · Peihao Wang · Wenqing Zheng · Kaixiong Zhou · Tianlong Chen · Xia Hu · Zhangyang "Atlas" Wang -
2021 Poster: Stronger NAS with Weaker Predictors »
Junru Wu · Xiyang Dai · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Ye Yu · Zhangyang Wang · Zicheng Liu · Mei Chen · Lu Yuan -
2021 Poster: VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text »
Hassan Akbari · Liangzhe Yuan · Rui Qian · Wei-Hong Chuang · Shih-Fu Chang · Yin Cui · Boqing Gong -
2020 Workshop: Second Workshop on AI for Humanitarian Assistance and Disaster Response »
Ritwik Gupta · Robin Murphy · Eric Heim · Zhangyang "Atlas" Wang · Bryce Goodman · Nirav Patel · Piotr Bilinski · Edoardo Nemni -
2020 Poster: Graph Contrastive Learning with Augmentations »
Yuning You · Tianlong Chen · Yongduo Sui · Ting Chen · Zhangyang "Atlas" Wang · Yang Shen -
2020 Poster: MATE: Plugging in Model Awareness to Task Embedding for Meta Learning »
Xiaohan Chen · Zhangyang "Atlas" Wang · Siyu Tang · Krikamol Muandet -
2020 Poster: Robust Pre-Training by Adversarial Contrastive Learning »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang "Atlas" Wang -
2020 Poster: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang "Atlas" Wang -
2020 Spotlight: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang "Atlas" Wang -
2020 Poster: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free »
Haotao Wang · Tianlong Chen · Shupeng Gui · TingKuei Hu · Ji Liu · Zhangyang "Atlas" Wang -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang "Atlas" Wang · Yingyan Lin -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang "Atlas" Wang · Michael Carbin -
2020 Poster: ShiftAddNet: A Hardware-Inspired Deep Network »
Haoran You · Xiaohan Chen · Yongan Zhang · Chaojian Li · Sicheng Li · Zihao Liu · Zhangyang "Atlas" Wang · Yingyan Lin