NLP with Synthetic Text
Mohammad Norouzi
2021 Keynote Talk
in
Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference)
in
Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference)
Abstract
Synthetic data is successfully used to train powerful machine learning models for computer vision and robotics, thanks to the availability of high-fidelity graphics and physics-based simulation. But, can synthetic data be successfully used to improve natural language processing? In this talk, I will advocate for the use of large language models as a great source of synthetic text. I will review recent work on data augmentation for NLP and describe a general framework for NLP with synthetic text, called “Generate, Annotate, and Learn”. I will highlight a few key results on generating unlabeled text for improving semi-supervised learning and knowledge distillation, in addition to advancing GPT3-style few-shot learning.
Video
Chat is not available.
Successful Page Load