Timezone: »

Uncertainty-aware Self-training for Few-shot Text Classification
Subhabrata Mukherjee · Ahmed Awadallah

Wed Dec 09 07:30 AM -- 07:40 AM (PST) @ Orals & Spotlights: Continual/Meta/Misc Learning

Recent success of pre-trained language models crucially hinges on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire or difficult to access for many applications. We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-scale unlabeled data for the target task. Standard self-training mechanism randomly samples instances from the unlabeled pool to generate pseudo-labels and augment labeled data. We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning. Specifically, we propose (i) acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout, and (ii) learning mechanism leveraging model confidence for self-training. As an application, we focus on text classification with five benchmark datasets. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation perform within 3% of fully supervised pre-trained language models fine-tuned on thousands of labels with an aggregate accuracy of 91% and improvement of up to 12% over baselines.

Author Information

Subhabrata Mukherjee (Microsoft Research)

I am a senior scientist at Microsoft Research (MSR) working at the intersection of natural language understanding, deep learning and transfer learning. My current research is focused on making AI accessible to all with two major themes: (1) Scaling deep and large-scale natural language understanding models to scenarios with limited computational resources leveraging techniques like self-supervised, weakly supervised and curriculum learning, data augmentation, knowledge distillation, etc. (2) Building trustworthy AI for mitigating misinformation and bias to provide fair and equitable information access to all. Prior to joining MSR, I was leading the information extraction efforts to build the Amazon Product Knowledge Graph, an authoritative knowledge graph for all products in the world. I graduated summa cum laude from the Max Planck Institute for Informatics, Germany with a PhD in 2017. I was awarded the 2018 SIGKDD Doctoral Dissertation Runner-up Award for my thesis on credibility analysis and misinformation.

Ahmed Awadallah (Microsoft)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors