Timezone: »
In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in image frames. Our experiments demonstrate strong performance on several challenging benchmarks for video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced computational cost. We establish new state-of-the-arts on multiple video datasets, including Kinetics-400, Kinetics-600, Charades, and AViD.
Author Information
Michael Ryoo (Google; Stony Brook University)
AJ Piergiovanni (Indiana University)
Anurag Arnab (University of Oxford)
Mostafa Dehghani (Google Brain)
Anelia Angelova (Google Research)
More from the Same Authors
-
2022 Poster: Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space »
Jinghuan Shang · Srijan Das · Michael Ryoo -
2022 Poster: Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels? »
Xiang Li · Jinghuan Shang · Srijan Das · Michael Ryoo -
2021 Poster: Attention Bottlenecks for Multimodal Fusion »
Arsha Nagrani · Shan Yang · Anurag Arnab · Aren Jansen · Cordelia Schmid · Chen Sun -
2021 Poster: Compressive Visual Representations »
Kuang-Huei Lee · Anurag Arnab · Sergio Guadarrama · John Canny · Ian Fischer -
2020 Poster: AViD Dataset: Anonymized Videos from Diverse Countries »
AJ Piergiovanni · Michael S Ryoo -
2019 : Coffee + Posters »
Changhao Chen · Nils Gählert · Edouard Leurent · Johannes Lehner · Apratim Bhattacharyya · Harkirat Singh Behl · Teck Yian Lim · Shiho Kim · Jelena Novosel · Błażej Osiński · Arindam Das · Ruobing Shen · Jeffrey Hawke · Joachim Sicking · Babak Shahian Jahromi · Theja Tulabandhula · Claudio Michaelis · Evgenia Rusak · WENHANG BAO · Hazem Rashed · JP Chen · Amin Ansari · Jaekwang Cha · Mohamed Zahran · Daniele Reda · Jinhyuk Kim · Kim Dohyun · Ho Suk · Junekyo Jhung · Alexander Kister · Matthias Fahrland · Adam Jakubowski · Piotr Miłoś · Jean Mercat · Bruno Arsenali · Silviu Homoceanu · Xiao-Yang Liu · Philip Torr · Ahmad El Sallab · Ibrahim Sobh · Anurag Arnab · Krzysztof Galias