Recent success of pre-trained language models crucially hinges on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire or difficult to access for many applications. We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-scale unlabeled data for the target task. Standard self-training mechanism randomly samples instances from the unlabeled pool to generate pseudo-labels and augment labeled data. We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning. Specifically, we propose (i) acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout, and (ii) learning mechanism leveraging model confidence for self-training. As an application, we focus on text classification with five benchmark datasets. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation perform within 3% of fully supervised pre-trained language models fine-tuned on thousands of labels with an aggregate accuracy of 91% and improvement of up to 12% over baselines.
Subhabrata Mukherjee (Microsoft Research)
I am a senior scientist at Microsoft Research (MSR) working at the intersection of natural language understanding, deep learning and transfer learning. My current research is focused on making AI accessible to all with two major themes: (1) Scaling deep and large-scale natural language understanding models to scenarios with limited computational resources leveraging techniques like self-supervised, weakly supervised and curriculum learning, data augmentation, knowledge distillation, etc. (2) Building trustworthy AI for mitigating misinformation and bias to provide fair and equitable information access to all. Prior to joining MSR, I was leading the information extraction efforts to build the Amazon Product Knowledge Graph, an authoritative knowledge graph for all products in the world. I graduated summa cum laude from the Max Planck Institute for Informatics, Germany with a PhD in 2017. I was awarded the 2018 SIGKDD Doctoral Dissertation Runner-up Award for my thesis on credibility analysis and misinformation.
Ahmed Awadallah (Microsoft)
Related Events (a corresponding poster, oral, or spotlight)
2020 Spotlight: Uncertainty-aware Self-training for Few-shot Text Classification »
Wed Dec 9th 03:30 -- 03:40 PM Room Orals & Spotlights: Continual/Meta/Misc Learning
More from the Same Authors
2021 : Few-Shot Learning Evaluation in Natural Language Understanding »
Subhabrata Mukherjee · Xiaodong Liu · Guoqing Zheng · Saghar Hosseini · Hao Cheng · Ge Yang · Christopher Meek · Ahmed Awadallah · Jianfeng Gao
2021 Poster: Fairness via Representation Neutralization »
Mengnan Du · Subhabrata Mukherjee · Guanchu Wang · Ruixiang Tang · Ahmed Awadallah · Xia Hu
2021 : IGLU: Interactive Grounded Language Understanding in a Collaborative Environment + Q&A »
· Ziming Li · Mohammad Aliannejadi · Maartje Anne ter Hoeve · Mikhail Burtsev · Alexey Skrynnik · Artem Zholus · Aleksandr Panov · Katja Hofmann · Kavya Srinet · arthur szlam · Michel Galley · Ahmed Awadallah