NeurIPS 2019 Expo Workshop

Nov. 28, 2022

From research to Industrial NLP at Baidu

Sponsor: BAIDU

Hua Wu (BAIDU), Xiangyang Zhou (BAIDU), Yu Sun (BAIDU), Jing Liu (BAIDU), Zhongjun He (BAIDU), Yanjun Ma (BAIDU), Dianhai Yu (BAIDU), Hao Tian (BAIDU), Xing Li (BAIDU)

This workshop will propose recent advances of NLP research and industrial applications at Baidu. We will first give an introductory talk on Baidu’s NLP combined with our open source deep learning platform – PaddlePaddle. The following talks will go depth into specific NLP topics.

  1. Overview of NLP at Baidu, Hua Wu. This talk will give an overview of NLP at Baidu, including researches and products.

  2. PaddlePaddle & PaddleNLP, Xiangyang Zhou. This talk will introduce our open source deep learning platform - PaddlePaddle, followed by frameworks and models customized for NLP tasks.

  3. ERNIE (Enhanced Representation through kNowledge IntEgration), Yu Sun. It is a brand-new natural language understanding framework. Based on this framework, Baidu also open sourced a pre-trained language understanding model which achieved state-of-the-art results and outperformed BERT and the recent XLNet in 16 NLP tasks in both Chinese and English.

  4. Machine Reading Comprehension, Jing Liu. This talk will describe our research efforts on machine reading comprehension (MRC) to deal with the real challenges when applying MRC technologies to the production of open-domain question answering in Baidu Search, including multi-passage MRC, knowledge-enhanced MRC and improving the robustness and generalization of MRC models, that is the winner solution at MRQA 2019.

  5. Machine Translation, Zhongjun He. This talk will introduce the technologies we used in this year’s WMT evaluation campaign, in which our system ranked the 1st in Chinese-English newswire translation. We will also introduce our efforts on simultaneous machine translation. Recently, we released a speech-to-speech simultaneous machine translation system, achieving comparable performance to human interpreters in delivering high-quality simultaneous speech translation with low latency.