Timezone: »
We present a self-supervised learning framework, COCO-LM, that pretrains Language Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style pretraining, COCO-LM employs an auxiliary language model to corrupt text sequences, upon which it constructs two new tasks for pretraining the main model. The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics. The second sequence-level task, Sequence Contrastive Learning, is to align text sequences originated from the same source input while ensuring uniformity in the representation space. Experiments on GLUE and SQuAD demonstrate that COCO-LM not only outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency. It achieves the MNLI accuracy of ELECTRA with 50% of its pretraining GPU hours. With the same pretraining steps of standard base/large-sized models, COCO-LM outperforms the previous best models by 1+ GLUE average points.
Author Information
Yu Meng (University of Illinois at Urbana-Champaign)
Chenyan Xiong (Microsoft Research AI)
Payal Bajaj (Microsoft)
saurabh tiwary (Microsoft)
Paul Bennett (Microsoft Research)
Jiawei Han (University of Illinois at Urbana-Champaign)
XIA SONG (Microsoft)
More from the Same Authors
-
2022 Poster: On the Representation Collapse of Sparse Mixture of Experts »
Zewen Chi · Li Dong · Shaohan Huang · Damai Dai · Shuming Ma · Barun Patra · Saksham Singhal · Payal Bajaj · XIA SONG · Xian-Ling Mao · Heyan Huang · Furu Wei -
2022 : Shift-Robust Node Classification via Graph Clustering Co-training »
Qi Zhu · Chao Zhang · Chanyoung Park · Carl Yang · Jiawei Han -
2022 Poster: Generating Training Data with Language Models: Towards Zero-Shot Language Understanding »
Yu Meng · Jiaxin Huang · Yu Zhang · Jiawei Han -
2021 Poster: Universal Graph Convolutional Networks »
Di Jin · Zhizhi Yu · Cuiying Huo · Rui Wang · Xiao Wang · Dongxiao He · Jiawei Han -
2021 Poster: Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data »
Qi Zhu · Natalia Ponomareva · Jiawei Han · Bryan Perozzi -
2021 Poster: Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization »
Qi Zhu · Carl Yang · Yidan Xu · Haonan Wang · Chao Zhang · Jiawei Han -
2020 Poster: Towards Interpretable Natural Language Understanding with Explanations as Latent Variables »
Wangchunshu Zhou · Jinyi Hu · Hanlin Zhang · Xiaodan Liang · Maosong Sun · Chenyan Xiong · Jian Tang -
2020 Poster: Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point »
Bita Darvish Rouhani · Daniel Lo · Ritchie Zhao · Ming Liu · Jeremy Fowers · Kalin Ovtcharov · Anna Vinogradsky · Sarah Massengill · Lita Yang · Ray Bittner · Alessandro Forin · Haishan Zhu · Taesik Na · Prerak Patel · Shuai Che · Lok Chand Koppaka · XIA SONG · Subhojit Som · Kaustav Das · Saurabh K T · Steve Reinhardt · Sitaram Lanka · Eric Chung · Doug Burger -
2019 Poster: Spherical Text Embedding »
Yu Meng · Jiaxin Huang · Guangyuan Wang · Chao Zhang · Honglei Zhuang · Lance Kaplan · Jiawei Han -
2014 Poster: Robust Tensor Decomposition with Gross Corruption »
Quanquan Gu · Huan Gui · Jiawei Han -
2012 Poster: Selective Labeling via Error Bound Minimization »
Quanquan Gu · Tong Zhang · Chris Ding · Jiawei Han -
2009 Poster: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models »
Jing Gao · Feng Liang · Wei Fan · Yizhou Sun · Jiawei Han