Timezone: »
Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%. Code and models are available at:https://github.com/facebookresearch/bit.
Author Information
Zechun Liu (CMU)
Barlas Oguz (Facebook AI)
Aasish Pappu (Meta AI)
Lin Xiao (Meta AI Research)
Scott Yih (Meta AI - FAIR)
Meng Li (Peking University)
Raghuraman Krishnamoorthi (Facebook)
Yashar Mehdad (Facebook)
More from the Same Authors
-
2023 : FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation »
Sewon Min · Kalpesh Krishna · Xinxi Lyu · Mike Lewis · Scott Yih · Pang Wei Koh · Mohit Iyyer · Luke Zettlemoyer · Hannaneh Hajishirzi -
2023 Poster: CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference »
Wenxuan Zeng · Meng Li · Haichuan Yang · Wen-jie Lu · Runsheng Wang · Ru Huang -
2022 Spotlight: BiT: Robustly Binarized Multi-distilled Transformer »
Zechun Liu · Barlas Oguz · Aasish Pappu · Lin Xiao · Scott Yih · Meng Li · Raghuraman Krishnamoorthi · Yashar Mehdad -
2022 Poster: Autoregressive Search Engines: Generating Substrings as Document Identifiers »
Michele Bevilacqua · Giuseppe Ottaviano · Patrick Lewis · Scott Yih · Sebastian Riedel · Fabio Petroni -
2019 Poster: Using Statistics to Automate Stochastic Optimization »
Hunter Lang · Lin Xiao · Pengchuan Zhang -
2019 Poster: Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization »
Koen Helwegen · James Widdicombe · Lukas Geiger · Zechun Liu · Kwang-Ting Cheng · Roeland Nusselder -
2019 Poster: A Stochastic Composite Gradient Method with Incremental Variance Reduction »
Junyu Zhang · Lin Xiao -
2019 Poster: Understanding the Role of Momentum in Stochastic Gradient Methods »
Igor Gitman · Hunter Lang · Pengchuan Zhang · Lin Xiao -
2019 Invited Talk: Test of Time: Dual Averaging Method for Regularized Stochastic Learning and Online Optimization »
Lin Xiao -
2018 Poster: Learning SMaLL Predictors »
Vikas Garg · Ofer Dekel · Lin Xiao -
2018 Poster: Coupled Variational Bayes via Optimization Embedding »
Bo Dai · Hanjun Dai · Niao He · Weiyang Liu · Zhen Liu · Jianshu Chen · Lin Xiao · Le Song -
2017 Poster: Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes »
Jianshu Chen · Chong Wang · Lin Xiao · Ji He · Lihong Li · Li Deng -
2015 Poster: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture »
Jianshu Chen · Ji He · Yelong Shen · Lin Xiao · Xiaodong He · Jianfeng Gao · Xinying Song · Li Deng -
2014 Poster: An Accelerated Proximal Coordinate Gradient Method »
Qihang Lin · Zhaosong Lu · Lin Xiao -
2012 Session: Oral Session 3 »
Lin Xiao -
2009 Poster: Dual Averaging Method for Regularized Stochastic Learning and Online Optimization »
Lin Xiao