Timezone: »
Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.
Author Information
Xindian Ma (Tianjin University)
Peng Zhang (Tianjin University)
Shuai Zhang (Tianjin University)
Nan Duan (Microsoft Research Asia)
Yuexian Hou (Tianjin University)
Ming Zhou (Microsoft Research)
Dawei Song (Beijing Institute of Technology)
More from the Same Authors
-
2021 : CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation »
Shuai Lu · Daya Guo · Shuo Ren · Junjie Huang · Alexey Svyatkovskiy · Ambrosio Blanco · Colin Clement · Dawn Drain · Daxin Jiang · Duyu Tang · Ge Li · Lidong Zhou · Linjun Shou · Long Zhou · Michele Tufano · MING GONG · Ming Zhou · Nan Duan · Neel Sundaresan · Shao Kun Deng · Shengyu Fu · Shujie LIU -
2022 Poster: Less-forgetting Multi-lingual Fine-tuning »
Yuren Mao · Yaobo Liang · Nan Duan · Haobo Wang · Kai Wang · Lu Chen · Yunjun Gao -
2023 Poster: AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation »
Tong Wu · Zhihao Fan · Xiao Liu · Yeyun Gong · yelong shen · Jian Jiao · Hai-Tao Zheng · Juntao Li · zhongyu wei · Jian Guo · Nan Duan · Weizhu Chen -
2022 Spotlight: Lightning Talks 6B-4 »
Junjie Chen · Chuanxia Zheng · JINLONG LI · Yu Shi · Shichao Kan · Yu Wang · FermÃn Travi · Ninh Pham · Lei Chai · Guobing Gan · Tung-Long Vuong · Gonzalo Ruarte · Tao Liu · Li Niu · Jingjing Zou · Zequn Jie · Peng Zhang · Ming LI · Yixiong Liang · Guolin Ke · Jianfei Cai · Gaston Bujia · Sunzhu Li · Siyuan Zhou · Jingyang Lin · Xu Wang · Min Li · Zhuoming Chen · Qing Ling · Xiaolin Wei · Xiuqing Lu · Shuxin Zheng · Dinh Phung · Yigang Cen · Jianlou Si · Juan Esteban Kamienkowski · Jianxin Wang · Chen Qian · Lin Ma · Benyou Wang · Yingwei Pan · Tie-Yan Liu · Liqing Zhang · Zhihai He · Ting Yao · Tao Mei -
2022 Spotlight: MorphTE: Injecting Morphology in Tensorized Embeddings »
Guobing Gan · Peng Zhang · Sunzhu Li · Xiuqing Lu · Benyou Wang -
2022 Poster: MorphTE: Injecting Morphology in Tensorized Embeddings »
Guobing Gan · Peng Zhang · Sunzhu Li · Xiuqing Lu · Benyou Wang -
2022 Poster: NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis »
Jian Liang · Chenfei Wu · Xiaowei Hu · Zhe Gan · Jianfeng Wang · Lijuan Wang · Zicheng Liu · Yuejian Fang · Nan Duan -
2022 Poster: LogiGAN: Learning Logical Reasoning via Adversarial Pre-training »
Xinyu Pi · Wanjun Zhong · Yan Gao · Nan Duan · Jian-Guang Lou -
2021 Poster: Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering »
Weijiang Yu · Haoteng Zheng · Mengfei Li · Lei Ji · Lijun Wu · Nong Xiao · Nan Duan -
2020 Poster: MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers »
Wenhui Wang · Furu Wei · Li Dong · Hangbo Bao · Nan Yang · Ming Zhou -
2019 Poster: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph »
Yikang LI · Tao Ma · Yeqi Bai · Nan Duan · Sining Wei · Xiaogang Wang -
2018 Poster: Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base »
Daya Guo · Duyu Tang · Nan Duan · Ming Zhou · Jian Yin