Timezone: »
We focus on the problem of efficient video stream processing with fully transformer-based architectures. Recent advances brought by transformers for image-based tasks inspires the research interests of applying transformers for videos. Yet, when applying image-based transformer solutions to videos, the computation becomes inefficient due to the redundant information in adjacent video frames. An analysis of the computation cost of the video object detection framework DETR identifies the linear layers as the major computation bottleneck. Thus, we propose dynamic gating layers to conduct conditional computation. With the generated binary or ternary gates, it is possible to avoid the computation for the stable background tokens in the video frames. The effectiveness of the dynamic gating mechanism for transformers is validated by experimental results. For video object detection, the FLOPs could be reduced by 48.3% without a significant drop of accuracy.
Author Information
Yawei Li (Swiss Federal Institute of Technology)
Babak Ehteshami Bejnordi (Qualcomm AI Research)
Bert Moons (Synopsis)
Tijmen Blankevoort (Qualcomm)
Amirhossein Habibian (Qualcomm AI Research)
Radu Timofte (ETH Zurich)
Luc V Gool (Computer Vision Lab, ETH Zurich)
More from the Same Authors
-
2019 Poster: Gated CRF Loss for Weakly Supervised Semantic Image Segmentation »
Anton Obukhov · Stamatios Georgoulis · Dengxin Dai · Luc V Gool -
2021 : Real-Time and Accurate Self-Supervised Monocular Depth Estimation on Mobile Device »
Hong Cai · Yinhao Zhu · Janarbek Matai · Fatih Porikli · Fei Yin · Tushar Singhal · Bharath Ramaswamy · Frank Mayer · Chirag Patel · Parham Noorzad · Andrii Skliar · Tijmen Blankevoort · Joseph Soriaga · Ron Tindall · Pat Lawlor -
2021 Poster: Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations »
Wouter Van Gansbeke · Simon Vandenhende · Stamatios Georgoulis · Luc V Gool -
2021 Social: Shine in Your Technical Presentation »
Armina Stepan · Tijmen Blankevoort -
2020 Poster: Bayesian Bits: Unifying Quantization and Pruning »
Mart van Baalen · Christos Louizos · Markus Nagel · Rana Ali Amjad · Ying Wang · Tijmen Blankevoort · Max Welling -
2020 Poster: GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network »
Prune Truong · Martin Danelljan · Luc V Gool · Radu Timofte -
2020 Poster: Soft Contrastive Learning for Visual Localization »
Janine Thoma · Danda Pani Paudel · Luc V Gool -
2017 Poster: Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations »
Eirikur Agustsson · Fabian Mentzer · Michael Tschannen · Lukas Cavigelli · Radu Timofte · Luca Benini · Luc V Gool -
2016 Poster: Dynamic Filter Networks »
Xu Jia · Bert De Brabandere · Tinne Tuytelaars · Luc V Gool -
2014 Poster: Quantized Kernel Learning for Feature Matching »
Danfeng Qin · Xuanli Chen · Matthieu Guillaumin · Luc V Gool -
2014 Poster: Self-Adaptable Templates for Feature Coding »
Xavier Boix · Gemma Roig · Salomon Diether · Luc V Gool -
2011 Poster: Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities »
Angela Yao · Juergen Gall · Luc V Gool · Raquel Urtasun