Timezone: »
Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks. However, NLMs are very computationally demanding largely due to the computational cost of the decoding process, which consists of a softmax layer over a large vocabulary.We observe that in the decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size. This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude while attaining close to the full softmax baseline accuracy on neural machine translation and language modeling tasks. We also prove the theoretical guarantee on the softmax approximation quality.
Author Information
Minjia Zhang (Microsoft)
Wenhan Wang (Microsoft)
Xiaodong Liu (Microsoft)
Jianfeng Gao (Microsoft Research, Redmond, WA)
Yuxiong He (Microsoft)
More from the Same Authors
-
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 : Few-Shot Learning Evaluation in Natural Language Understanding »
Subhabrata Mukherjee · Xiaodong Liu · Guoqing Zheng · Saghar Hosseini · Hao Cheng · Ge Yang · Christopher Meek · Ahmed Awadallah · Jianfeng Gao -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 Poster: NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM »
Connor Holmes · Minjia Zhang · Yuxiong He · Bo Wu -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Poster: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer »
Ge Yang · Edward Hu · Igor Babuschkin · Szymon Sidor · Xiaodong Liu · David Farhi · Nick Ryder · Jakub Pachocki · Weizhu Chen · Jianfeng Gao -
2021 : WebQA Competition + Q&A »
Yingshan CHANG · Yonatan Bisk · Mridu Narang · Levi Melnick · Jianfeng Gao · Hisami Suzuki · Guihong Cao -
2020 Poster: HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory »
Jie Ren · Minjia Zhang · Dong Li -
2020 Poster: Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping »
Minjia Zhang · Yuxiong He -
2020 Poster: AdaTune: Adaptive Tensor Program Compilation Made Efficient »
Menghao Li · Minjia Zhang · Chi Wang · Mingqin Li -
2019 Poster: Unified Language Model Pre-training for Natural Language Understanding and Generation »
Li Dong · Nan Yang · Wenhui Wang · Furu Wei · Xiaodong Liu · Yu Wang · Jianfeng Gao · Ming Zhou · Hsiao-Wuen Hon -
2018 Poster: M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search »
Yelong Shen · Jianshu Chen · Po-Sen Huang · Yuqing Guo · Jianfeng Gao -
2018 Poster: Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization »
Yizhe Zhang · Michel Galley · Jianfeng Gao · Zhe Gan · Xiujun Li · Chris Brockett · Bill Dolan -
2017 : Invited Talk: Microsoft (Asli and Jianfeng) »
Jianfeng Gao -
2015 Poster: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture »
Jianshu Chen · Ji He · Yelong Shen · Lin Xiao · Xiaodong He · Jianfeng Gao · Xinying Song · Li Deng