Timezone: »
Recent success of pre-trained language models crucially hinges on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire or difficult to access for many applications. We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-scale unlabeled data for the target task. Standard self-training mechanism randomly samples instances from the unlabeled pool to generate pseudo-labels and augment labeled data. We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning. Specifically, we propose (i) acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout, and (ii) learning mechanism leveraging model confidence for self-training. As an application, we focus on text classification with five benchmark datasets. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation perform within 3% of fully supervised pre-trained language models fine-tuned on thousands of labels with an aggregate accuracy of 91% and improvement of up to 12% over baselines.
Author Information
Subhabrata Mukherjee (Microsoft Research)
Principal Researcher at Microsoft Research leading cross-org initiative for [Efficient AI at Scale]. Our focus is on efficient learning of massive neural networks for both model (e.g., neural architecture search, model compression, sparse and modular learning) and data efficiency (e.g., zero-shot and few-shot learning, semi-supervised learning). We develop state-of-the-art computationally efficient models and techniques to enable AI practitioners, researchers and engineers to use large-scale models in practice. Our technologies have been deployed in several enterprise scenarios including Turing, Bing and Microsoft 365. Honors: 2022 MIT Technology Review Innovators under 35 Semi-finalist (listed in 100 innovators under 35 world-wide) for work on Efficient AI.
Ahmed Awadallah (Microsoft)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Uncertainty-aware Self-training for Few-shot Text Classification »
Wed. Dec 9th 05:00 -- 07:00 PM Room Poster Session 3 #1015
More from the Same Authors
-
2021 : Few-Shot Learning Evaluation in Natural Language Understanding »
Subhabrata Mukherjee · Xiaodong Liu · Guoqing Zheng · Saghar Hosseini · Hao Cheng · Ge Yang · Christopher Meek · Ahmed Awadallah · Jianfeng Gao -
2022 : Fifteen-minute Competition Overview Video »
Maartje Anne ter Hoeve · Mikhail Burtsev · Zoya Volovikova · Ziming Li · Yuxuan Sun · Shrestha Mohanty · Negar Arabzadeh · Mohammad Aliannejadi · Milagro Teruel · Marc-Alexandre Côté · Kavya Srinet · arthur szlam · Artem Zholus · Alexey Skrynnik · Aleksandr Panov · Ahmed Awadallah · Julia Kiseleva -
2022 Competition: IGLU: Interactive Grounded Language Understanding in a Collaborative Environment »
Julia Kiseleva · Alexey Skrynnik · Artem Zholus · Shrestha Mohanty · Negar Arabzadeh · Marc-Alexandre Côté · Mohammad Aliannejadi · Milagro Teruel · Ziming Li · Mikhail Burtsev · Maartje Anne ter Hoeve · Zoya Volovikova · Aleksandr Panov · Yuxuan Sun · arthur szlam · Ahmed Awadallah · Kavya Srinet -
2022 Poster: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models »
Dongkuan (DK) Xu · Subhabrata Mukherjee · Xiaodong Liu · Debadeepta Dey · Wenhui Wang · Xiang Zhang · Ahmed Awadallah · Jianfeng Gao -
2021 Poster: Fairness via Representation Neutralization »
Mengnan Du · Subhabrata Mukherjee · Guanchu Wang · Ruixiang Tang · Ahmed Awadallah · Xia Hu -
2021 : IGLU: Interactive Grounded Language Understanding in a Collaborative Environment + Q&A »
Julia Kiseleva · Ziming Li · Mohammad Aliannejadi · Maartje Anne ter Hoeve · Mikhail Burtsev · Alexey Skrynnik · Artem Zholus · Aleksandr Panov · Katja Hofmann · Kavya Srinet · arthur szlam · Michel Galley · Ahmed Awadallah