Timezone: »
Contrastive learning (CL) has been the de facto technique for self-supervised representation learning (SSL), with impressive empirical success such as multi-modal representation learning. However, traditional CL loss only considers negative samples from a minibatch, which could cause biased gradients due to the non-decomposibility of the loss. For the first time, we consider optimizing a more generalized contrastive loss, where each data sample is associated with an infinite number of negative samples. We show that directly using minibatch stochastic optimization could lead to gradient bias. To remedy this, we propose an efficient Bayesian data augmentation technique to augment the contrastive loss into a decomposable one, where standard stochastic optimization can be directly applied without gradient bias. Specifically, our augmented loss defines a joint distribution over the model parameters and the augmented parameters, which can be conveniently optimized by a proposed stochastic expectation-maximization algorithm. Our framework is more general and is related to several popular SSL algorithms. We verify our framework on both small scale models and several large foundation models, including SSL of ImageNet and SSL for vision-language representation learning. Experiment results indicate the existence of gradient bias in all cases, and demonstrate the effectiveness of the proposed method on improving previous state of the arts. Remarkably, our method can outperform the strong MoCo-v3 under the same hyper-parameter setting with only around half of the minibatch size; and also obtains strong results in the recent public benchmark ELEVATER for few-shot image classification.
Author Information
Changyou Chen (University at Buffalo)
Jianyi Zhang (Duke University)
Yi Xu (Amazon)
Liqun Chen (Duke University)
Jiali Duan (University of Southern California)
Yiran Chen (Duke University)
Son Tran (Amazon)
Belinda Zeng (Amazon)
Trishul Chilimbi (Amazon)
More from the Same Authors
-
2021 : Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning »
Xuanli He · Iman Keivanloo · Yi Xu · Xiang He · Belinda Zeng · Santosh Rajagopalan · Trishul Chilimbi -
2021 : CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models »
Aashiq Muhamed · Iman Keivanloo · Sujan Perera · James Mracek · Yi Xu · Qingjun Cui · Santosh Rajagopalan · Belinda Zeng · Trishul Chilimbi -
2022 : Fine-grain Inference on Out-of-Distribution Data with Hierarchical Classification »
Randolph Linderman · Jingyang Zhang · Nathan Inkawhich · Hai Li · Yiran Chen -
2023 Poster: Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels »
Jian Chen · Ruiyi Zhang · Tong Yu · Rohan Sharma · Zhiqiang Xu · Tong Sun · Changyou Chen -
2021 : CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models »
Aashiq Muhamed · Iman Keivanloo · Sujan Perera · James Mracek · Yi Xu · Qingjun Cui · Santosh Rajagopalan · Belinda Zeng · Trishul Chilimbi -
2021 Poster: FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective »
Jingwei Sun · Ang Li · Louis DiValentin · Amin Hassanzadeh · Yiran Chen · Hai Li -
2020 Poster: Learning Manifold Implicitly via Explicit Heat-Kernel Learning »
Yufan Zhou · Changyou Chen · Jinhui Xu -
2020 Poster: Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability »
Nathan Inkawhich · Kevin J Liang · Binghui Wang · Matthew Inkawhich · Lawrence Carin · Yiran Chen -
2020 Poster: Bayesian Multi-type Mean Field Multi-agent Imitation Learning »
Fan Yang · Alina Vereshchaka · Changyou Chen · Wen Dong -
2020 Spotlight: Bayesian Multi-type Mean Field Multi-agent Imitation Learning »
Fan Yang · Alina Vereshchaka · Changyou Chen · Wen Dong -
2019 Poster: Certified Adversarial Robustness with Additive Noise »
Bai Li · Changyou Chen · Wenlin Wang · Lawrence Carin -
2019 Poster: Reward Constrained Interactive Recommendation with Natural Language Feedback »
Ruiyi Zhang · Tong Yu · Yilin Shen · Hongxia Jin · Changyou Chen -
2018 Poster: Generalized Inverse Optimization through Online Learning »
Chaosheng Dong · Yiran Chen · Bo Zeng -
2017 Poster: TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning »
Wei Wen · Cong Xu · Feng Yan · Chunpeng Wu · Yandan Wang · Yiran Chen · Hai Li -
2017 Poster: ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching »
Chunyuan Li · Hao Liu · Changyou Chen · Yuchen Pu · Liqun Chen · Ricardo Henao · Lawrence Carin -
2017 Oral: TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning »
Wei Wen · Cong Xu · Feng Yan · Chunpeng Wu · Yandan Wang · Yiran Chen · Hai Li -
2016 Poster: Towards Unifying Hamiltonian Monte Carlo and Slice Sampling »
Yizhe Zhang · Xiangyu Wang · Changyou Chen · Ricardo Henao · Kai Fan · Lawrence Carin -
2016 Poster: Stochastic Gradient MCMC with Stale Gradients »
Changyou Chen · Nan Ding · Chunyuan Li · Yizhe Zhang · Lawrence Carin -
2015 Poster: On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators »
Changyou Chen · Nan Ding · Lawrence Carin -
2014 Poster: Bayesian Sampling Using Stochastic Gradient Thermostats »
Nan Ding · Youhan Fang · Ryan Babbush · Changyou Chen · Robert D Skeel · Hartmut Neven -
2014 Poster: Robust Bayesian Max-Margin Clustering »
Changyou Chen · Jun Zhu · Xinhua Zhang