Timezone: »
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we use a SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracy. For example, Semi-ViT-Huge achieves an impressive 80\% top-1 accuracy on ImageNet using only 1\% labels, which is comparable with Inception-v4 using 100\% ImageNet labels. The code is available at https://github.com/amazon-science/semi-vit.
Author Information
Zhaowei Cai (Amazon)
Avinash Ravichandran (AWS)
Paolo Favaro (University of Bern)
Manchen Wang (Amazon)
Davide Modolo (Amazon)
Rahul Bhotika (Optum Labs)
Zhuowen Tu (University of California, San Diego)
Stefano Soatto (UCLA)
Stefano Soatto received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical Engineering and Biomedical Engineering at Washington University, and Research Associate in Applied Sciences at Harvard University. Between 1995 and 1998 he was also Ricercatore in the Department of Mathematics and Computer Science at the University of Udine - Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. His general research interests are in Computer Vision and Nonlinear Estimation and Control Theory. In particular, he is interested in ways for computers to use sensory information to interact with humans and the environment. Dr. Soatto is the recipient of the David Marr Prize for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion. He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is a Member of the Editorial Board of the International Journal of Computer Vision (IJCV) and Foundations and Trends in Computer Graphics and Vision. He is the founder and director of the UCLA Vision Lab; more information is available at http://vision.ucla.edu
More from the Same Authors
-
2021 Spotlight: Uniform Sampling over Episode Difficulty »
Sébastien Arnold · Guneet Dhillon · Avinash Ravichandran · Stefano Soatto -
2021 Spotlight: Long Short-Term Transformer for Online Action Detection »
Mingze Xu · Yuanjun Xiong · Hao Chen · Xinyu Li · Wei Xia · Zhuowen Tu · Stefano Soatto -
2022 Poster: On Leave-One-Out Conditional Mutual Information For Generalization »
Mohamad Rida Rammal · Alessandro Achille · Aditya Golatkar · Suhas Diggavi · Stefano Soatto -
2022 : On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning »
yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu -
2022 : On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning »
yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu -
2022 : Evaluating Worst Case Adversarial Weather Perturbations Robustness »
Yihan Wang · Yunhao Ba · Howard Zhang · Huan Zhang · Achuta Kadambi · Stefano Soatto · Alex Wong · Cho-Jui Hsieh -
2023 Poster: Gacs-Korner Common Information Variational Autoencoder »
Michael Kleinman · Alessandro Achille · Stefano Soatto · Jonathan Kao -
2023 Poster: Your representations are in the network: composable and parallel adaptation for large scale models »
Yonatan Dukler · Alessandro Achille · Hao Yang · Varsha Vivek · Luca Zancato · Benjamin Bowman · Avinash Ravichandran · Charless Fowlkes · Ashwin Swaminathan · Stefano Soatto -
2022 Poster: An In-depth Study of Stochastic Backpropagation »
Jun Fang · Mingze Xu · Hao Chen · Bing Shuai · Zhuowen Tu · Joseph Tighe -
2022 Poster: MOVE: Unsupervised Movable Object Segmentation and Detection »
Adam Bielski · Paolo Favaro -
2021 Poster: Uniform Sampling over Episode Difficulty »
Sébastien Arnold · Guneet Dhillon · Avinash Ravichandran · Stefano Soatto -
2021 Poster: Long Short-Term Transformer for Online Action Detection »
Mingze Xu · Yuanjun Xiong · Hao Chen · Xinyu Li · Wei Xia · Zhuowen Tu · Stefano Soatto -
2020 Poster: Predicting Training Time Without Training »
Luca Zancato · Alessandro Achille · Avinash Ravichandran · Rahul Bhotika · Stefano Soatto -
2018 : Poster Session »
Sujay Sanghavi · Vatsal Shah · Yanyao Shen · Tianchen Zhao · Yuandong Tian · Tomer Galanti · Mufan Li · Gilad Cohen · Daniel Rothchild · Aristide Baratin · Devansh Arpit · Vagelis Papalexakis · Michael Perlmutter · Ashok Vardhan Makkuva · Pim de Haan · Yingyan Lin · Wanmo Kang · Cheolhyoung Lee · Hao Shen · Sho Yaida · Dan Roberts · Nadav Cohen · Philippe Casgrain · Dejiao Zhang · Tengyu Ma · Avinash Ravichandran · Julian Emilio Salazar · Bo Li · Davis Liang · Christopher Wong · Glen Bigan Mbeng · Animesh Garg -
2018 : Plenary Talk 3 »
Stefano Soatto -
2017 Poster: Deep Mean-Shift Priors for Image Restoration »
Siavash Arjomand Bigdeli · Matthias Zwicker · Paolo Favaro · Meiguang Jin -
2017 Spotlight: Deep Mean-Shift Priors for Image Restoration »
Siavash Arjomand Bigdeli · Matthias Zwicker · Paolo Favaro · Meiguang Jin -
2017 Poster: Introspective Classification with Convolutional Nets »
Long Jin · Justin Lazarow · Zhuowen Tu -
2010 Tutorial: Vision-Based Control, Control-Based Vision, and the Information Knot That Ties Them »
Stefano Soatto -
2010 Poster: Occlusion Detection and Motion Estimation with Convex Optimization »
Alper Ayvaci · Michalis Raptis · Stefano Soatto -
2006 Poster: Detecting Humans via Their Pose »
Alessandro Bissacco · Ming-Hsuan Yang · Stefano Soatto