Timezone: »
Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI. Conventional methods successfully enable convolutional neural network inference of high resolution images on microcontrollers, while the framework for vision transformers that achieve the state-of-the-art performance in many vision applications still remains unexplored. In this paper, we propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory, where we jointly design transformer architecture and construct the inference operator library to fit the memory resource constraint. More specifically, we generalize the one-shot network architecture search (NAS) to discover the optimal architecture with highest task performance given the memory budget from the microcontrollers, where we enlarge the existing search space of vision transformers by considering the low-rank decomposition dimensions and patch resolution for memory reduction. For the construction of the inference operator library of vision transformers, we schedule the memory buffer during inference through operator integration, patch embedding decomposition, and token overwriting, allowing the memory buffer to be fully utilized to adapt to the forward pass of the vision transformer. Experimental results demonstrate that our MCUFormer achieves 73.62\% top-1 accuracy on ImageNet for image classification with 320KB memory on STM32F746 microcontroller. Code is available at https://github.com/liangyn22/MCUFormer.
Author Information
Yinan Liang (Tsinghua University, Tsinghua University)
Ziwei Wang (Tsinghua University, Tsinghua University)
Xiuwei Xu (Tsinghua University, Tsinghua University)
Yansong Tang (University of Oxford)
Jie Zhou (Tsinghua University)
Jiwen Lu (Tsinghua University)
More from the Same Authors
-
2022 Poster: OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression »
Wanhua Li · Xiaoke Huang · Zheng Zhu · Yansong Tang · Xiu Li · Jie Zhou · Jiwen Lu -
2022 Poster: P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting »
Ziyi Wang · Xumin Yu · Yongming Rao · Jie Zhou · Jiwen Lu -
2023 Poster: VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks »
Wenhai Wang · Zhe Chen · Xiaokang Chen · Jiannan Wu · Xizhou Zhu · Gang Zeng · Ping Luo · Tong Lu · Jie Zhou · Yu Qiao · Jifeng Dai -
2023 Poster: SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation »
Zhuoyan Luo · Yicheng Xiao · Yong Liu · Shuyan Li · Yitong Wang · Yansong Tang · Xiu Li · Yujiu Yang -
2023 Poster: UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models »
Wenliang Zhao · Lujia Bai · Yongming Rao · Jie Zhou · Jiwen Lu -
2022 Spotlight: P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting »
Ziyi Wang · Xumin Yu · Yongming Rao · Jie Zhou · Jiwen Lu -
2022 Spotlight: Lightning Talks 6A-1 »
Ziyi Wang · Nian Liu · Yaming Yang · Qilong Wang · Yuanxin Liu · Zongxin Yang · Yizhao Gao · Yanchen Deng · Dongze Lian · Nanyi Fei · Ziyu Guan · Xiao Wang · Shufeng Kong · Xumin Yu · Daquan Zhou · Yi Yang · Fandong Meng · Mingze Gao · Caihua Liu · Yongming Rao · Zheng Lin · Haoyu Lu · Zhe Wang · Jiashi Feng · Zhaolin Zhang · Deyu Bo · Xinchao Wang · Chuan Shi · Jiangnan Li · Jiangtao Xie · Jie Zhou · Zhiwu Lu · Wei Zhao · Bo An · Jiwen Lu · Peihua Li · Jian Pei · Hao Jiang · Cai Xu · Peng Fu · Qinghua Hu · Yijie Li · Weigang Lu · Yanan Cao · Jianbin Huang · Weiping Wang · Zhao Cao · Jie Zhou -
2022 Poster: HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions »
Yongming Rao · Wenliang Zhao · Yansong Tang · Jie Zhou · Ser Nam Lim · Jiwen Lu -
2021 Poster: DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification »
Yongming Rao · Wenliang Zhao · Benlin Liu · Jiwen Lu · Jie Zhou · Cho-Jui Hsieh -
2021 Poster: Global Filter Networks for Image Classification »
Yongming Rao · Wenliang Zhao · Zheng Zhu · Jiwen Lu · Jie Zhou -
2017 Poster: Runtime Neural Pruning »
Ji Lin · Yongming Rao · Jiwen Lu · Jie Zhou