Timezone: »
We focus on the problem of efficient video stream processing with fully transformer-based architectures. Recent advances brought by transformers for image-based tasks inspires the research interests of applying transformers for videos. Yet, when applying image-based transformer solutions to videos, the computation becomes inefficient due to the redundant information in adjacent video frames. An analysis of the computation cost of the video object detection framework DETR identifies the linear layers as the major computation bottleneck. Thus, we propose dynamic gating layers to conduct conditional computation. With the generated binary or ternary gates, it is possible to avoid the computation for the stable background tokens in the video frames. The effectiveness of the dynamic gating mechanism for transformers is validated by experimental results. For video object detection, the FLOPs could be reduced by 48.3% without a significant drop of accuracy.
Author Information
Yawei Li (Swiss Federal Institute of Technology)
Babak Ehteshami Bejnordi (Qualcomm AI Research)
Bert Moons (Synopsis)
Tijmen Blankevoort (Qualcomm)
Amirhossein Habibian (Qualcomm AI Research)
Radu Timofte (ETH Zurich)
Luc V Gool (Computer Vision Lab, ETH Zurich)
More from the Same Authors
-
2019 Poster: Gated CRF Loss for Weakly Supervised Semantic Image Segmentation »
Anton Obukhov · Stamatios Georgoulis · Dengxin Dai · Luc V Gool -
2022 Poster: Recurrent Video Restoration Transformer with Guided Deformable Attention »
Jingyun Liang · Yuchen Fan · Xiaoyu Xiang · Rakesh Ranjan · Eddy Ilg · Simon Green · Jiezhang Cao · Kai Zhang · Radu Timofte · Luc V Gool -
2022 Spotlight: Lightning Talks 5A-4 »
Yangrui Chen · Zhiyang Chen · Liang Zhang · Hanqing Wang · Jiaqi Han · Shuchen Wu · shaohui peng · Ganqu Cui · Yoav Kolumbus · Noemi Elteto · Xing Hu · Anwen Hu · Wei Liang · Cong Xie · Lifan Yuan · Noam Nisan · Wenbing Huang · Yousong Zhu · Ishita Dasgupta · Luc V Gool · Tingyang Xu · Rui Zhang · Qin Jin · Zhaowen Li · Meng Ma · Bingxiang He · Yangyi Chen · Juncheng Gu · Wenguan Wang · Ke Tang · Yu Rong · Eric Schulz · Fan Yang · Wei Li · Zhiyuan Liu · Jiaming Guo · Yanghua Peng · Haibin Lin · Haixin Wang · Qi Yi · Maosong Sun · Ruizhi Chen · Chuan Wu · Chaoyang Zhao · Yibo Zhu · Liwei Wu · xishan zhang · Zidong Du · Rui Zhao · Jinqiao Wang · Ling Li · Qi Guo · Ming Tang · Yunji Chen -
2022 Spotlight: Towards Versatile Embodied Navigation »
Hanqing Wang · Wei Liang · Luc V Gool · Wenguan Wang -
2022 Spotlight: Recurrent Video Restoration Transformer with Guided Deformable Attention »
Jingyun Liang · Yuchen Fan · Xiaoyu Xiang · Rakesh Ranjan · Eddy Ilg · Simon Green · Jiezhang Cao · Kai Zhang · Radu Timofte · Luc V Gool -
2022 Poster: I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification »
Muhammad Ferjad Naeem · Yongqin Xian · Luc V Gool · Federico Tombari -
2022 Poster: Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging »
Yuanhao Cai · Jing Lin · Haoqian Wang · Xin Yuan · Henghui Ding · Yulun Zhang · Radu Timofte · Luc V Gool -
2022 Poster: Towards Versatile Embodied Navigation »
Hanqing Wang · Wei Liang · Luc V Gool · Wenguan Wang -
2022 Expo Demonstration: Conditional Compute for On-device Video Understanding »
Tijmen Blankevoort -
2021 : Real-Time and Accurate Self-Supervised Monocular Depth Estimation on Mobile Device »
Hong Cai · Yinhao Zhu · Janarbek Matai · Fatih Porikli · Fei Yin · Tushar Singhal · Bharath Ramaswamy · Frank Mayer · Chirag Patel · Parham Noorzad · Andrii Skliar · Tijmen Blankevoort · Joseph Soriaga · Ron Tindall · Pat Lawlor -
2021 Poster: Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations »
Wouter Van Gansbeke · Simon Vandenhende · Stamatios Georgoulis · Luc V Gool -
2021 Social: Shine in Your Technical Presentation »
Armina Stepan · Tijmen Blankevoort -
2020 Poster: Bayesian Bits: Unifying Quantization and Pruning »
Mart van Baalen · Christos Louizos · Markus Nagel · Rana Ali Amjad · Ying Wang · Tijmen Blankevoort · Max Welling -
2020 Poster: GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network »
Prune Truong · Martin Danelljan · Luc V Gool · Radu Timofte -
2020 Poster: Soft Contrastive Learning for Visual Localization »
Janine Thoma · Danda Pani Paudel · Luc V Gool -
2017 Poster: Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations »
Eirikur Agustsson · Fabian Mentzer · Michael Tschannen · Lukas Cavigelli · Radu Timofte · Luca Benini · Luc V Gool -
2016 Poster: Dynamic Filter Networks »
Xu Jia · Bert De Brabandere · Tinne Tuytelaars · Luc V Gool -
2014 Poster: Quantized Kernel Learning for Feature Matching »
Danfeng Qin · Xuanli Chen · Matthieu Guillaumin · Luc V Gool -
2014 Poster: Self-Adaptable Templates for Feature Coding »
Xavier Boix · Gemma Roig · Salomon Diether · Luc V Gool -
2011 Poster: Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities »
Angela Yao · Juergen Gall · Luc V Gool · Raquel Urtasun