Timezone: »
Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that \textbf{sparse and robust subnetworks (SRNets) can consistently be found in BERT}, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that \textbf{there exist sparse and almost unbiased BERT subnetworks}. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at \url{https://github.com/llyx97/sparse-and-robust-PLM}.
Author Information
Yuanxin Liu (Institute of Information Engineering, Chinese Academy of Sciences; SCS, University of Chinese Academy of Sciences)
Fandong Meng (WeChat AI)
Zheng Lin (Institute of Information Engineering, Chinese Academy of Sciences)
Jiangnan Li (Institute of Information Engineering, Chinese Academy of Sciences)
Peng Fu (Institute of Information Engineering, Chinese Academy of Sciences)
Yanan Cao (Institute of Information Engineering, Chinese Academy of Sciences)
Weiping Wang (Institute of Information Engineering, CAS, China)
Jie Zhou (WeChat AI)
More from the Same Authors
-
2022 Spotlight: A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models »
Yuanxin Liu · Fandong Meng · Zheng Lin · Jiangnan Li · Peng Fu · Yanan Cao · Weiping Wang · Jie Zhou -
2022 Spotlight: Lightning Talks 6A-1 »
Ziyi Wang · Nian Liu · Yaming Yang · Qilong Wang · Yuanxin Liu · Zongxin Yang · Yizhao Gao · Yanchen Deng · Dongze Lian · Nanyi Fei · Ziyu Guan · Xiao Wang · Shufeng Kong · Xumin Yu · Daquan Zhou · Yi Yang · Fandong Meng · Mingze Gao · Caihua Liu · Yongming Rao · Zheng Lin · Haoyu Lu · Zhe Wang · Jiashi Feng · Zhaolin Zhang · Deyu Bo · Xinchao Wang · Chuan Shi · Jiangnan Li · Jiangtao Xie · Jie Zhou · Zhiwu Lu · Wei Zhao · Bo An · Jiwen Lu · Peihua Li · Jian Pei · Hao Jiang · Cai Xu · Peng Fu · Qinghua Hu · Yijie Li · Weigang Lu · Yanan Cao · Jianbin Huang · Weiping Wang · Zhao Cao · Jie Zhou -
2022 Poster: Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means »
Rong Yin · Yong Liu · Weiping Wang · Dan Meng -
2021 Poster: Topology-Imbalance Learning for Semi-Supervised Node Classification »
Deli Chen · Yankai Lin · Guangxiang Zhao · Xuancheng Ren · Peng Li · Jie Zhou · Xu Sun -
2020 Poster: Graph Geometry Interaction Learning »
Shichao Zhu · Shirui Pan · Chuan Zhou · Jia Wu · Yanan Cao · Bin Wang -
2019 Poster: Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations »
Fenglin Liu · Yuanxin Liu · Xuancheng Ren · Xiaodong He · Xu Sun -
2018 Poster: Multi-Class Learning: From Theory to Algorithm »
Jian Li · Yong Liu · Rong Yin · Hua Zhang · Lizhong Ding · Weiping Wang