DynamicViT: Faster Vision Transformer
Amanuel Mersha · Samuel Assefa
Keywords:
Computer Vision
Abstract
The recent deep learning breakthroughs in language and vision tasks can be mainly attributed to large-scale transformers. Unfortunately, their massive size and high compute requirements have limited their use in resource-constrained environments. Dynamic neural networks promise reduced amount of compute requirement by dynamically adjusting the computational path based on the input. We propose a layer skipping dynamic vision transformer (DynamicViT) that skips layers for each sample based on decisions given by a reinforcement learning agent. Extensive experiment on CIFAR-10 and CIFAR-100 showed that this dynamic ViT gained an average of 40\% speed increase evaluated on different batch sizes ranging from 1 to 1024.
Chat is not available.
Successful Page Load