Timezone: »
Training large models from scratch usually costs a substantial amount of resources. Towards this problem, recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model (termed the ``target model''), leading to a considerable acceleration in training. Despite the successes of these previous studies, they grew pretrained models by mapping partial weights only, ignoring potential correlations across the entire model. As we show in this paper, there are inter- and intra-interactions among the weights of both the pretrained and the target models. As a result, the partial mapping may not capture the complete information and lead to inadequate growth. In this paper, we propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model to further enhance acceleration ability. We utilize multi-linear operators to reduce computational and spacial complexity, enabling acceptable resource requirements. Experiments demonstrate that our method can save 76\% computational costs on DeiT-base transferred from DeiT-small, which outperforms bert2BERT by +12\% and LiGO by +21\%, respectively.
Author Information
Yu Pan (Harbin Institute of Technology, Shenzhen)
Ye Yuan (Peking University)
Yichun Yin (Huawei Noah's Ark Lab)
Zenglin Xu (Harbin Institute of Technology Shenzhen)
Lifeng Shang (Huawei Technologies Ltd.)
Xin Jiang (Noah’s Ark Lab, Huawei Technologies)
Qun Liu (Huawei Noah's Ark Lab)
More from the Same Authors
-
2022 Poster: Multi-view Subspace Clustering on Topological Manifold »
Shudong Huang · Hongjie Wu · Yazhou Ren · Ivor Tsang · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Poster: Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid »
Jing Xu · Xu Luo · Xinglin Pan · Yanan Li · Wenjie Pei · Zenglin Xu -
2022 Poster: TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models »
Huibin Ge · Xiaohu Zhao · Chuang Liu · Yulong Zeng · Qun Liu · Deyi Xiong -
2023 Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants »
Mehdi Rezagholizadeh · Peyman Passban · Yue Dong · Yu Cheng · Soheila Samiee · Lili Mou · Qun Liu · Boxing Chen -
2023 Poster: Predicting Global Label Relationship Matrix for Graph Neural Networks under Heterophily »
Langzhang Liang · Xiangjing Hu · Zenglin Xu · Zixing Song · Irwin King -
2023 Poster: Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline »
Zangwei Zheng · Xiaozhe Ren · Fuzhao Xue · Yang Luo · Xin Jiang · Yang You -
2022 Spotlight: TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models »
Huibin Ge · Xiaohu Zhao · Chuang Liu · Yulong Zeng · Qun Liu · Deyi Xiong -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: Multi-view Subspace Clustering on Topological Manifold »
Shudong Huang · Hongjie Wu · Yazhou Ren · Ivor Tsang · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: Lightning Talks 1B-3 »
Chaofei Wang · Qixun Wang · Jing Xu · Long-Kai Huang · Xi Weng · Fei Ye · Harsh Rangwani · shrinivas ramasubramanian · Yifei Wang · Qisen Yang · Xu Luo · Lei Huang · Adrian G. Bors · Ying Wei · Xinglin Pan · Sho Takemori · Hong Zhu · Rui Huang · Lei Zhao · Yisen Wang · Kato Takashi · Shiji Song · Yanan Li · Rao Anwer · Yuhei Umeda · Salman Khan · Gao Huang · Wenjie Pei · Fahad Shahbaz Khan · Venkatesh Babu R · Zenglin Xu -
2022 Spotlight: Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid »
Jing Xu · Xu Luo · Xinglin Pan · Yanan Li · Wenjie Pei · Zenglin Xu -
2022 Workshop: Second Workshop on Efficient Natural Language and Speech Processing (ENLSP-II) »
Mehdi Rezagholizadeh · Peyman Passban · Yue Dong · Lili Mou · Pascal Poupart · Ali Ghodsi · Qun Liu -
2022 Poster: Towards Efficient Post-training Quantization of Pre-trained Language Models »
Haoli Bai · Lu Hou · Lifeng Shang · Xin Jiang · Irwin King · Michael R Lyu -
2022 Poster: Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark »
Jiaxi Gu · Xiaojun Meng · Guansong Lu · Lu Hou · Niu Minzhe · Xiaodan Liang · Lewei Yao · Runhui Huang · Wei Zhang · Xin Jiang · Chunjing XU · Hang Xu -
2021 Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference) »
Mehdi Rezaghoizadeh · Lili Mou · Yue Dong · Pascal Poupart · Ali Ghodsi · Qun Liu -
2021 Poster: Rectifying the Shortcut Learning of Background for Few-Shot Learning »
Xu Luo · Longhui Wei · Liangjian Wen · Jinrong Yang · Lingxi Xie · Zenglin Xu · Qi Tian -
2021 Poster: ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE »
Qingzhong Ai · LIRONG HE · SHIYU LIU · Zenglin Xu -
2020 Poster: DynaBERT: Dynamic BERT with Adaptive Width and Depth »
Lu Hou · Zhiqi Huang · Lifeng Shang · Xin Jiang · Xiao Chen · Qun Liu -
2020 Spotlight: DynaBERT: Dynamic BERT with Adaptive Width and Depth »
Lu Hou · Zhiqi Huang · Lifeng Shang · Xin Jiang · Xiao Chen · Qun Liu -
2016 Poster: Distributed Flexible Nonlinear Tensor Factorization »
Shandian Zhe · Kai Zhang · Pengyuan Wang · Kuang-chih Lee · Zenglin Xu · Yuan Qi · Zoubin Ghahramani -
2013 Poster: Exact and Stable Recovery of Pairwise Interaction Tensors »
Shouyuan Chen · Michael R Lyu · Irwin King · Zenglin Xu -
2013 Spotlight: Exact and Stable Recovery of Pairwise Interaction Tensors »
Shouyuan Chen · Michael R Lyu · Irwin King · Zenglin Xu -
2010 Workshop: Machine Learning for Social Computing »
Zenglin Xu · Irwin King · Shenghuo Zhu · Yuan Qi · Rong Yan · John Yen -
2009 Poster: Adaptive Regularization for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu · Zhirong Yang -
2009 Spotlight: Adaptive Regularization for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu · Zhirong Yang -
2009 Poster: Heavy-Tailed Symmetric Stochastic Neighbor Embedding »
Zhirong Yang · Irwin King · Zenglin Xu · Erkki Oja -
2009 Spotlight: Heavy-Tailed Symmetric Stochastic Neighbor Embedding »
Zhirong Yang · Irwin King · Zenglin Xu · Erkki Oja -
2008 Poster: An Extended Level Method for Efficient Multiple Kernel Learning »
Zenglin Xu · Rong Jin · Irwin King · Michael R Lyu -
2007 Poster: Efficient Convex Relaxation for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu