Timezone: »
Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms, our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.
Author Information
Zhiqing Sun (Carnegie Mellon University)
Zhuohan Li (UC Berkeley)
Haoqing Wang (Peking University)
Di He (Peking University)
Zi Lin (Peking University)
Zhihong Deng (Peking University)
More from the Same Authors
-
2023 : GeoMFormer: A General Architecture for Geometric Molecular Representation Learning »
Tianlang Chen · Tianlang Chen · Shengjie Luo · Shengjie Luo · Di He · Shuxin Zheng · Shuxin Zheng · Tie-Yan Liu · Tie-Yan Liu · Liwei Wang · Liwei Wang -
2023 Poster: Focus Your Attention when Few-Shot Classification »
Haoqing Wang · Shibo Jie · Zhihong Deng -
2023 Poster: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision »
Zhiqing Sun · Yikang Shen · Qinhong Zhou · Hongxin Zhang · Zhenfang Chen · David Cox · Yiming Yang · Chuang Gan -
2023 Poster: Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective »
Guhao Feng · Bohang Zhang · Yuntian Gu · Haotian Ye · Di He · Liwei Wang -
2023 Poster: GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning »
Haiteng Zhao · Shengchao Liu · Ma Chang · Hannan Xu · Jie Fu · Zhihong Deng · Lingpeng Kong · Qi Liu -
2023 Oral: Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective »
Guhao Feng · Bohang Zhang · Yuntian Gu · Haotian Ye · Di He · Liwei Wang -
2023 Poster: DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization »
Zhiqing Sun · Yiming Yang -
2023 Poster: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena »
Lianmin Zheng · Wei-Lin Chiang · Ying Sheng · Siyuan Zhuang · Zhanghao Wu · Yonghao Zhuang · Zi Lin · Zhuohan Li · Dacheng Li · Eric Xing · Hao Zhang · Joseph Gonzalez · Ion Stoica -
2022 Spotlight: Lightning Talks 4A-3 »
Zhihan Gao · Yabin Wang · Xingyu Qu · Luziwei Leng · Mingqing Xiao · Bohan Wang · Yu Shen · Zhiwu Huang · Xingjian Shi · Qi Meng · Yupeng Lu · Diyang Li · Qingyan Meng · Kaiwei Che · Yang Li · Hao Wang · Huishuai Zhang · Zongpeng Zhang · Kaixuan Zhang · Xiaopeng Hong · Xiaohan Zhao · Di He · Jianguo Zhang · Yaofeng Tu · Bin Gu · Yi Zhu · Ruoyu Sun · Yuyang (Bernie) Wang · Zhouchen Lin · Qinghu Meng · Wei Chen · Wentao Zhang · Bin CUI · Jie Cheng · Zhi-Ming Ma · Mu Li · Qinghai Guo · Dit-Yan Yeung · Tie-Yan Liu · Jianxing Liao -
2022 Spotlight: Online Training Through Time for Spiking Neural Networks »
Mingqing Xiao · Qingyan Meng · Zongpeng Zhang · Di He · Zhouchen Lin -
2022 Poster: DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems »
Ruizhong Qiu · Zhiqing Sun · Yiming Yang -
2022 Poster: Is $L^2$ Physics Informed Loss Always Suitable for Training Physics Informed Neural Network? »
Chuwei Wang · Shanda Li · Di He · Liwei Wang -
2022 Poster: Your Transformer May Not be as Powerful as You Expect »
Shengjie Luo · Shanda Li · Shuxin Zheng · Tie-Yan Liu · Liwei Wang · Di He -
2022 Poster: Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective »
Bohang Zhang · Du Jiang · Di He · Liwei Wang -
2022 Poster: Online Training Through Time for Spiking Neural Networks »
Mingqing Xiao · Qingyan Meng · Zongpeng Zhang · Di He · Zhouchen Lin -
2021 Poster: Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding »
Shengjie Luo · Shanda Li · Tianle Cai · Di He · Dinglan Peng · Shuxin Zheng · Guolin Ke · Liwei Wang · Tie-Yan Liu -
2021 Poster: Do Transformers Really Perform Badly for Graph Representation? »
Chengxuan Ying · Tianle Cai · Shengjie Luo · Shuxin Zheng · Guolin Ke · Di He · Yanming Shen · Tie-Yan Liu -
2018 Poster: Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation »
Tianyu He · Xu Tan · Yingce Xia · Di He · Tao Qin · Zhibo Chen · Tie-Yan Liu -
2018 Poster: FRAGE: Frequency-Agnostic Word Representation »
Chengyue Gong · Di He · Xu Tan · Tao Qin · Liwei Wang · Tie-Yan Liu -
2017 Poster: Decoding with Value Networks for Neural Machine Translation »
Di He · Hanqing Lu · Yingce Xia · Tao Qin · Liwei Wang · Tie-Yan Liu -
2016 Poster: Dual Learning for Machine Translation »
Di He · Yingce Xia · Tao Qin · Liwei Wang · Nenghai Yu · Tie-Yan Liu · Wei-Ying Ma