Timezone: »
Recently vision transformers have been shown to be competitive with convolution-based methods (CNNs) broadly across multiple vision tasks. The less restrictive inductive bias of transformers endows greater representational capacity in comparison with CNNs. However, in the image classification setting this flexibility comes with a trade-off with respect to sample efficiency, where transformers require ImageNet-scale training. This notion has carried over to video where transformers have not yet been explored for video classification in the low-labeled or semi-supervised settings. Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting compared to CNNs. We specifically evaluate video vision transformers across two contrasting video datasets (Kinetics-400 and SomethingSomething-V2) and perform thorough analysis and ablation studies to explain this observation using the predominant features of video transformer architectures. We even show that using just the labeled data, transformers significantly outperform complex semi-supervised CNN methods that leverage large-scale unlabeled data as well. Our experiments inform our recommendation that semi-supervised learning video work should consider the use of video transformers in the future.
Author Information
Farrukh Rahman (Microsoft / Georgia Tech)
Ömer Mubarek (Georgia Institute of Technology)
Zsolt Kira (Georgia Institute of Techology)
More from the Same Authors
-
2021 Spotlight: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 Spotlight: A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition »
Junjiao Tian · Dylan Yung · Yen-Chang Hsu · Zsolt Kira -
2021 : Exploring Covariate and Concept Shift for Out-of-Distribution Detection »
Junjiao Tian · Yen-Chang Hsu · Yilin Shen · Hongxia Jin · Zsolt Kira -
2022 : Fifteen-minute Competition Overview Video »
Dhruv Batra · Manolis Savva · Zsolt Kira · Vincent-Pierre Berges · Karmesh Yadav · Angel Chang · Andrew Szot · Alexander Clegg · Aaron Gokaslan -
2023 Competition: The HomeRobot Open Vocabulary Mobile Manipulation Challenge »
Sriram Yenamandra · Arun Ramachandran · Mukul Khanna · Karmesh Yadav · Devendra Singh Chaplot · Gunjan Chhablani · Alexander Clegg · Theophile Gervet · Vidhi Jain · Ruslan Partsey · Ram Ramrakhya · Andrew Szot · Austin Wang · Tsung-Yen Yang · Aaron Edsinger · Charles Kemp · Binit Shah · Zsolt Kira · Dhruv Batra · Roozbeh Mottaghi · Yonatan Bisk · Chris Paxton -
2022 Competition: Habitat Rearrangement Challenge »
Andrew Szot · Karmesh Yadav · Alexander Clegg · Vincent-Pierre Berges · Aaron Gokaslan · Angel Chang · Manolis Savva · Zsolt Kira · Dhruv Batra -
2022 : Panel »
Tyler Hayes · Tinne Tuytelaars · Subutai Ahmad · João Sacramento · Zsolt Kira · Hava Siegelmann · Christopher Summerfield -
2022 Poster: Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks »
Yen-Cheng Liu · CHIH-YAO MA · Junjiao Tian · Zijian He · Zsolt Kira -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 Poster: A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition »
Junjiao Tian · Dylan Yung · Yen-Chang Hsu · Zsolt Kira -
2021 Poster: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2020 Poster: Posterior Re-calibration for Imbalanced Datasets »
Junjiao Tian · Yen-Cheng Liu · Nathaniel Glaser · Yen-Chang Hsu · Zsolt Kira -
2018 : Lunch & Posters »
Haytham Fayek · German Parisi · Brian Xu · Pramod Kaushik Mudrakarta · Sophie Cerf · Sarah Wassermann · Davit Soselia · Rahaf Aljundi · Mohamed Elhoseiny · Frantzeska Lavda · Kevin J Liang · Arslan Chaudhry · Sanmit Narvekar · Vincenzo Lomonaco · Wesley Chung · Michael Chang · Ying Zhao · Zsolt Kira · Pouya Bashivan · Banafsheh Rafiee · Oleksiy Ostapenko · Andrew Jones · Christos Kaplanis · Sinan Kalkan · Dan Teng · Xu He · Vincent Liu · Somjit Nath · Sungsoo Ahn · Ting Chen · Shenyang Huang · Yash Chandak · Nathan Sprague · Martin Schrimpf · Tony Kendall · Jonathan Richard Schwarz · Michael Li · Yunshu Du · Yen-Chang Hsu · Samira Abnar · Bo Wang