Timezone: »
Poster
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Tong Wu · Zhihao Fan · Xiao Liu · Hai-Tao Zheng · Yeyun Gong · yelong shen · Jian Jiao · Juntao Li · zhongyu wei · Jian Guo · Nan Duan · Weizhu Chen
Event URL: https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion »
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach.To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right.In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion.
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach.To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right.In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion.
Author Information
Tong Wu (Tsinghua University)
Zhihao Fan (Alibaba Damo Academy)
Xiao Liu (Microsoft Research Asia)

My name is Xiao LIU (刘啸 in Chinese). I am a researcher in the [Natural Language Computing](https://www.microsoft.com/en-us/research/group/natural-language-computing/) group at [MSRA](https://www.msra.cn/). I obtained my Ph.D. degree from [BIT](http://www.bit.edu.cn/) in June 2022, under the supervision of Prof. [Heyan Huang](http://cs.bit.edu.cn/szdw/jsml/js/hhy/index.htm). My doctoral thesis is about event extraction and was awarded the excellent Ph.D. thesis of Beijing Institute of Technology. My current research interests include **generation**, **retrieval** and **extraction**. You can find me in [Microsoft Research](https://www.microsoft.com/en-us/research/people/xiaoliu2/).
Hai-Tao Zheng (Tsinghua University, Tsinghua University)
Yeyun Gong (Microsoft)
yelong shen (Microsoft)
Jian Jiao (Microsoft)
Juntao Li (Soochow University, China)
zhongyu wei
Jian Guo
Nan Duan (Microsoft Research Asia)
Weizhu Chen (Microsoft Azure AI)
More from the Same Authors
-
2021 : CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation »
Shuai Lu · Daya Guo · Shuo Ren · Junjie Huang · Alexey Svyatkovskiy · Ambrosio Blanco · Colin Clement · Dawn Drain · Daxin Jiang · Duyu Tang · Ge Li · Lidong Zhou · Linjun Shou · Long Zhou · Michele Tufano · MING GONG · Ming Zhou · Nan Duan · Neel Sundaresan · Shao Kun Deng · Shengyu Fu · Shujie LIU -
2021 : GPU-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning »
Xiao-Yang Liu · Zhuoran Yang · Zhaoran Wang · Anwar Walid · Jian Guo · Michael Jordan -
2022 Poster: Less-forgetting Multi-lingual Fine-tuning »
Yuren Mao · Yaobo Liang · Nan Duan · Haobo Wang · Kai Wang · Lu Chen · Yunjun Gao -
2022 : The Counterfactual-Shapley Value: Attributing Change in System Metrics »
Amit Sharma · Hua Li · Jian Jiao -
2023 : Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing »
Xinyu Hu · Pengfei Tang · Simiao Zuo · Zihan Wang · Bowen Song · Qiang Lou · Jian Jiao · Denis Charles -
2023 : HART: Efficient Adaptation via Regularized Autoregressive Parameter Generation »
Chen Liang · Nikos Karampatziakis · Tuo Zhao · Weizhu Chen -
2023 : Evaluating Adversarial Defense in the Era of Large Language Models »
Joachim Studnia · Simiao Zuo · Xiaodong Liu · Qiang Lou · Jian Jiao · Denis Charles -
2023 : Evaluating Adversarial Defense in the Era of Large Language Models »
Joachim Studnia · Simiao Zuo · Xiaodong Liu · Qiang Lou · Jian Jiao · Denis Charles -
2023 : CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing »
Zhibin Gou · Zhihong Shao · Yeyun Gong · yelong shen · Yujiu Yang · Nan Duan · Weizhu Chen -
2023 : Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing »
Xinyu Hu · Pengfei Tang · Simiao Zuo · Zihan Wang · Bowen Song · Qiang Lou · Jian Jiao · Denis Charles -
2023 : An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models »
Yadong Lu · Chunyuan Li · Haotian Liu · Jianwei Yang · Jianfeng Gao · yelong shen -
2023 : Sparse Backpropagation for MoE Training »
Liyuan Liu · Jianfeng Gao · Weizhu Chen -
2023 : [Paper-Oral 8] LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models »
Yixiao Li · Yifan Yu · Chen Liang · Nikos Karampatziakis · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models »
Zhendong Wang · Yifan Jiang · Huangjie Zheng · Peihao Wang · Pengcheng He · Zhangyang "Atlas" Wang · Weizhu Chen · Mingyuan Zhou -
2023 Poster: On-the-Fly Adapting Code Summarization on Trainable Cost-Effective Language Models »
Yufan Cai · Yun Lin · Chenyan Liu · Jinglian Wu · Yifan Zhang · Yiming Liu · Yeyun Gong · Jin Song Dong -
2023 Poster: Meet in the Middle: A New Pre-training Paradigm »
Anh Nguyen · Nikos Karampatziakis · Weizhu Chen -
2023 Poster: In-Context Learning Unlocked for Diffusion Models »
Zhendong Wang · Yifan Jiang · Yadong Lu · yelong shen · Pengcheng He · Weizhu Chen · Zhangyang "Atlas" Wang · Mingyuan Zhou -
2022 Poster: NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis »
Jian Liang · Chenfei Wu · Xiaowei Hu · Zhe Gan · Jianfeng Wang · Lijuan Wang · Zicheng Liu · Yuejian Fang · Nan Duan -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2022 Poster: LogiGAN: Learning Logical Reasoning via Adversarial Pre-training »
Xinyu Pi · Wanjun Zhong · Yan Gao · Nan Duan · Jian-Guang Lou -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 Poster: Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering »
Weijiang Yu · Haoteng Zheng · Mengfei Li · Lei Ji · Lijun Wu · Nong Xiao · Nan Duan -
2021 Poster: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer »
Ge Yang · Edward Hu · Igor Babuschkin · Szymon Sidor · Xiaodong Liu · David Farhi · Nick Ryder · Jakub Pachocki · Weizhu Chen · Jianfeng Gao -
2021 Poster: Curriculum Learning for Vision-and-Language Navigation »
Jiwen Zhang · zhongyu wei · Jianqing Fan · Jiajie Peng -
2021 Poster: R-Drop: Regularized Dropout for Neural Networks »
xiaobo liang · Lijun Wu · Juntao Li · Yue Wang · Qi Meng · Tao Qin · Wei Chen · Min Zhang · Tie-Yan Liu -
2019 Poster: A Tensorized Transformer for Language Modeling »
Xindian Ma · Peng Zhang · Shuai Zhang · Nan Duan · Yuexian Hou · Ming Zhou · Dawei Song -
2019 Poster: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph »
Yikang LI · Tao Ma · Yeqi Bai · Nan Duan · Sining Wei · Xiaogang Wang -
2018 Poster: Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base »
Daya Guo · Duyu Tang · Nan Duan · Ming Zhou · Jian Yin -
2014 Poster: Large-scale L-BFGS using MapReduce »
Weizhu Chen · Zhenghao Wang · Jingren Zhou -
2014 Spotlight: Large-scale L-BFGS using MapReduce »
Weizhu Chen · Zhenghao Wang · Jingren Zhou