Timezone: »
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UniLM achieves new state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm.
Author Information
Li Dong (Microsoft Research)
Nan Yang (Microsoft Research Asia)
Wenhui Wang (Microsoft Research)
Furu Wei (Microsoft Research Asia)
Xiaodong Liu (Microsoft)
Yu Wang (Microsoft Research)
Jianfeng Gao (Microsoft Research, Redmond, WA)
Ming Zhou (Microsoft Research)
Hsiao-Wuen Hon (Microsoft Research)
More from the Same Authors
-
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 : Few-Shot Learning Evaluation in Natural Language Understanding »
Subhabrata Mukherjee · Xiaodong Liu · Guoqing Zheng · Saghar Hosseini · Hao Cheng · Ge Yang · Christopher Meek · Ahmed Awadallah · Jianfeng Gao -
2022 Poster: On the Representation Collapse of Sparse Mixture of Experts »
Zewen Chi · Li Dong · Shaohan Huang · Damai Dai · Shuming Ma · Barun Patra · Saksham Singhal · Payal Bajaj · XIA SONG · Xian-Ling Mao · Heyan Huang · Furu Wei -
2023 Poster: Bridging Discrete and Backpropagation: Straight-Through and Beyond »
Liyuan Liu · Chengyu Dong · Xiaodong Liu · Bin Yu · Jianfeng Gao -
2023 Poster: Extensible Prompts for Language Models on Zero-shot Language Style Customization »
Tao Ge · Hu Jing · Li Dong · Shaoguang Mao · Yan Xia · Xun Wang · Si-Qing Chen · Furu Wei -
2023 Poster: TextDiffuser: Diffusion Models as Text Painters »
Jingye Chen · Yupan Huang · Tengchao Lv · Lei Cui · Qifeng Chen · Furu Wei -
2023 Poster: Localized Symbolic Knowledge Distillation for Visual Commonsense Models »
Jae Sung Park · Jack Hessel · Khyathi Chandu · Paul Pu Liang · Ximing Lu · Qiuyuan Huang · Peter West · Jianfeng Gao · Ali Farhadi · Yejin Choi -
2023 Poster: Guiding Large Language Models via Directional Stimulus Prompting »
Zekun Li · Baolin Peng · Pengcheng He · Michel Galley · Jianfeng Gao · Xifeng Yan -
2023 Poster: Language Is Not All You Need: Aligning Perception with Language Models »
Shaohan Huang · Li Dong · Wenhui Wang · Yaru Hao · Saksham Singhal · Shuming Ma · Tengchao Lv · Lei Cui · Owais Khan Mohammed · Barun Patra · Qiang Liu · Kriti Aggarwal · Zewen Chi · Nils Bjorck · Vishrav Chaudhary · Subhojit Som · XIA SONG · Furu Wei -
2023 Poster: Segment Everything Everywhere All at Once »
Xueyan Zou · Jianwei Yang · Hao Zhang · Feng Li · Linjie Li · Jianfeng Wang · Lijuan Wang · Jianfeng Gao · Yong Jae Lee -
2023 Poster: Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models »
Pan Lu · Baolin Peng · Hao Cheng · Michel Galley · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Jianfeng Gao -
2023 Poster: On the Pareto Front of Multilingual Neural Machine Translation »
Liang Chen · Shuming Ma · Dongdong Zhang · Furu Wei · Baobao Chang -
2023 Poster: Optimizing Prompts for Text-to-Image Generation »
Yaru Hao · Zewen Chi · Li Dong · Furu Wei -
2023 Poster: Language Models Augmented with Decoupled Memory »
Weizhi Wang · Li Dong · Hao Cheng · Xiaodong Liu · Xifeng Yan · Jianfeng Gao · Furu Wei -
2023 Poster: LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day »
Chunyuan Li · Cliff Wong · Sheng Zhang · Naoto Usuyama · Haotian Liu · Jianwei Yang · Tristan Naumann · Hoifung Poon · Jianfeng Gao -
2023 Oral: Bridging Discrete and Backpropagation: Straight-Through and Beyond »
Liyuan Liu · Chengyu Dong · Xiaodong Liu · Bin Yu · Jianfeng Gao -
2022 Spotlight: Focal Modulation Networks »
Jianwei Yang · Chunyuan Li · Xiyang Dai · Jianfeng Gao -
2022 Spotlight: ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models »
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao -
2022 Spotlight: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone »
Zi-Yi Dou · Aishwarya Kamath · Zhe Gan · Pengchuan Zhang · Jianfeng Wang · Linjie Li · Zicheng Liu · Ce Liu · Yann LeCun · Nanyun Peng · Jianfeng Gao · Lijuan Wang -
2022 Poster: ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models »
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao -
2022 Poster: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models »
Dongkuan (DK) Xu · Subhabrata Mukherjee · Xiaodong Liu · Debadeepta Dey · Wenhui Wang · Xiang Zhang · Ahmed Awadallah · Jianfeng Gao -
2022 Poster: VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts »
Hangbo Bao · Wenhui Wang · Li Dong · Qiang Liu · Owais Khan Mohammed · Kriti Aggarwal · Subhojit Som · Songhao Piao · Furu Wei -
2022 Poster: Focal Modulation Networks »
Jianwei Yang · Chunyuan Li · Xiyang Dai · Jianfeng Gao -
2022 Poster: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: GLIPv2: Unifying Localization and Vision-Language Understanding »
Haotian Zhang · Pengchuan Zhang · Xiaowei Hu · Yen-Chun Chen · Liunian Li · Xiyang Dai · Lijuan Wang · Lu Yuan · Jenq-Neng Hwang · Jianfeng Gao -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Poster: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer »
Ge Yang · Edward Hu · Igor Babuschkin · Szymon Sidor · Xiaodong Liu · David Farhi · Nick Ryder · Jakub Pachocki · Weizhu Chen · Jianfeng Gao -
2021 : WebQA Competition + Q&A »
Yingshan CHANG · Yonatan Bisk · Mridu Narang · Levi Melnick · Jianfeng Gao · Hisami Suzuki · Guihong Cao -
2020 Poster: BERT Loses Patience: Fast and Robust Inference with Early Exit »
Wangchunshu Zhou · Canwen Xu · Tao Ge · Julian McAuley · Ke Xu · Furu Wei -
2020 Poster: MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers »
Wenhui Wang · Furu Wei · Li Dong · Hangbo Bao · Nan Yang · Ming Zhou -
2018 Poster: M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search »
Yelong Shen · Jianshu Chen · Po-Sen Huang · Yuqing Guo · Jianfeng Gao -
2018 Poster: Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization »
Yizhe Zhang · Michel Galley · Jianfeng Gao · Zhe Gan · Xiujun Li · Chris Brockett · Bill Dolan -
2018 Poster: Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models »
Minjia Zhang · Wenhan Wang · Xiaodong Liu · Jianfeng Gao · Yuxiong He -
2017 : Invited Talk: Microsoft (Asli and Jianfeng) »
Jianfeng Gao -
2015 Poster: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture »
Jianshu Chen · Ji He · Yelong Shen · Lin Xiao · Xiaodong He · Jianfeng Gao · Xinying Song · Li Deng