Timezone: »
Model quantization has emerged as an indispensable technique to accelerate deep learning inference. Although researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning up the training settings, we find existing algorithms have about-the-same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap and still a long way to go. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.
Author Information
Yuhang Li (Yale University)
Mingzhu Shen (Sensetime Research)
Jian Ma (Xi'an Jiaotong University)
Yan Ren (Xidian University)
Mingxin Zhao
Qi Zhang (Beihang University)
Ruihao Gong (Beihang University)
Fengwei Yu (Beihang University)
Junjie Yan (Sensetime Group Limited)
More from the Same Authors
-
2022 Poster: Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models »
Xiuying Wei · Yunchen Zhang · Xiangguo Zhang · Ruihao Gong · Shanghang Zhang · Qi Zhang · Fengwei Yu · Xianglong Liu -
2022 : Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception »
Bin Huang · Yangguang Li · Feng Liang · Enze Xie · Luya Wang · Mingzhu Shen · Fenggang Liu · Tianqi Wang · Ping Luo · Jing Shao -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models »
Xiuying Wei · Yunchen Zhang · Xiangguo Zhang · Ruihao Gong · Shanghang Zhang · Qi Zhang · Fengwei Yu · Xianglong Liu -
2022 : Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks »
Yuhang Li · Ruokai Yin · Hyoungseob Park · Youngeun Kim · Priyadarshini Panda -
2022 : Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks »
Yuhang Li · Ruokai Yin · Hyoungseob Park · Youngeun Kim · Priyadarshini Panda -
2021 Poster: Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks »
Yuhang Li · Yufei Guo · Shanghang Zhang · Shikuang Deng · Yongqing Hai · Shi Gu -
2020 Poster: Improving Auto-Augment via Augmentation-Wise Weight Sharing »
Keyu Tian · Chen Lin · Ming Sun · Luping Zhou · Junjie Yan · Wanli Ouyang -
2019 Poster: Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection »
Junran Peng · Ming Sun · ZHAO-XIANG ZHANG · Tieniu Tan · Junjie Yan -
2018 Poster: Synaptic Strength For Convolutional Neural Network »
CHEN LIN · Zhao Zhong · Wu Wei · Junjie Yan