Poster
Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices
Zhihao Shu · Xiaowei Yu · Zihao Wu · Wenqi Jia · Yinchen Shi · Miao Yin · Tianming Liu · Dajiang Zhu · Wei Niu
East Exhibit Hall A-C #2002
Mobile devices have become an essential enabler for AI applications, particularly in scenarios that require real-time performance. Vision Transformer (ViT) has become a fundamental cornerstone in this regard due to its high accuracy. Recent efforts have been dedicated to developing various Transformer architectures that offer improved accuracy while reducing the computational requirements. However, existing research primarily focuses on reducing theoretical computation size through methods such as local attention and model pruning, rather than considering realistic performance on mobile hardware. Although these optimizations reduce computational demands, they either introduce additional overheads related to data transformation (e.g., Reshape and Transpose) or irregular computation/data-access patterns. These result in significant overhead on the mobile devices due to their limited bandwidth, which even makes the latency worse than vanilla ViT on mobile. In this paper, we present ECP-ViT, a real-time framework that employs the core-periphery principle inspired by the brain functional networks to guide self-attention in ViTs and enables the deployment of ViT models on smartphones. We identify the main bottleneck in Transformer structures caused by data transformation and propose a hardware-friendly core-periphery guided self-attention to decrease computation demands. Additionally, we design comprehensive systematic optimizations to completely eliminate the need for data transformation operations. ECP-ViT with proposed algorithm-system co-optimizations achieve a speedup of 4.6x to 26.9x on mobile GPU across four datasets (STL-10, CIFAR100, TinyImageNet, and ImageNet).
Live content is unavailable. Log in and register to view live content