Timezone: »
Poster
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Patrick Chen · Si Si · Yang Li · Ciprian Chelba · Cho-Jui Hsieh
Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. For advanced NLP problems, a neural language model usually consists of recurrent layers (e.g., using LSTM cells), an embedding matrix for representing input tokens, and a softmax layer for generating output tokens. For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves state-of-the-art performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90\% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words). We start by grouping words into $c$ blocks based on their frequency, and then refine the clustering iteratively by constructing weighted low-rank approximation for each block, where the weights are based the frequencies of the words in the block. The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning. On the OBW dataset, our method achieved 6.6x compression rate for the embedding and softmax matrices, and when combined with quantization, our method can achieve 26x compression rate without losing prediction accuracy.
Author Information
Patrick Chen (UCLA)
Si Si (Google Research)
Yang Li (Google)
Yang Li is a Senior Staff Research Scientist at Google, and an affiliate faculty member at the University of Washington CSE, focusing on the area intersecting AI and HCI. He pioneered on-device interactive ML on Android by developing impactful product features such as next app prediction and Gesture Search. Yang has extensively published in top venues across both the HCI and ML fields, including CHI, UIST, ICML, ACL, EMNLP, CVPR, NeurIPS (NIPS), ICLR, and KDD, and has constantly served as area chairs or senior area (track) chairs across the fields. Yang is also an editor of the upcoming Springer book on "AI for HCI: A Modern Approach", which is the first thorough treatment of the topic.
Ciprian Chelba (Google)
Cho-Jui Hsieh (UCLA, Google Research)
More from the Same Authors
-
2021 Poster: Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding »
Yang Li · Si Si · Gang Li · Cho-Jui Hsieh · Samy Bengio -
2020 Poster: Multi-Stage Influence Function »
Hongge Chen · Si Si · Yang Li · Ciprian Chelba · Sanjiv Kumar · Duane Boning · Cho-Jui Hsieh -
2019 Poster: Robustness Verification of Tree-based Models »
Hongge Chen · Huan Zhang · Si Si · Yang Li · Duane Boning · Cho-Jui Hsieh -
2019 Poster: A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning »
Xuanqing Liu · Si Si · Jerry Zhu · Yang Li · Cho-Jui Hsieh -
2018 Poster: Learning from Group Comparisons: Exploiting Higher Order Interactions »
Yao Li · Minhao Cheng · Kevin Fujii · Fushing Hsieh · Cho-Jui Hsieh -
2018 Poster: Efficient Neural Network Robustness Certification with General Activation Functions »
Huan Zhang · Tsui-Wei Weng · Pin-Yu Chen · Cho-Jui Hsieh · Luca Daniel