Timezone: »
The attention mechanism is at the core of state-of-the-art Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flat-view" matrix methods to process sequence of tokens embedded in vector spaces, resulting in exceedingly high parameter complexity for practical applications. To this end, we introduce a novel Tensorized Spectral Attention (TSA) mechanism, which leverages on the Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based spectral graph filters. By virtue of multi-linear algebra, such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from exponential to linear in the weight matrix dimensions. Furthermore, the graph formulation of the attention domain enables the processing of tensorized embeddings through spectral graph convolution filters, which further increases its expressive power. The benefits of the TSA are demonstrated through five benchmark NLP experiments, where the proposed mechanism is shown to achieve better or comparable results against traditional attention models, while incurring drastically lower parameter complexity.
Author Information
Yao Lei Xu (Imperial College London)
Kriton Konstantinidis (Imperial College London)
Shengxi Li (Imperial College London)
Danilo Mandic (Imperial College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : A Tensorized Spectral Attention Mechanism for Efficient Natural Language Processing »
Tue. Dec 14th 08:05 -- 08:10 PM Room
More from the Same Authors
-
2021 : Bayesian Tensor Networks »
Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis »
Yichong Leng · Zehua Chen · Junliang Guo · Haohe Liu · Jiawei Chen · Xu Tan · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Poster: BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis »
Yichong Leng · Zehua Chen · Junliang Guo · Haohe Liu · Jiawei Chen · Xu Tan · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2021 : Bayesian Tensor Networks »
Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic -
2021 : Danilo P. Mandic »
Danilo Mandic -
2021 : Multi-graph Tensor Networks: Big Data Analytics on Irregular Domains »
Danilo Mandic -
2020 : Poster 1: Multi-Graph Tensor Networks by Yao Lei Xu »
Yao Lei Xu -
2020 Poster: Reciprocal Adversarial Learning via Characteristic Functions »
Shengxi Li · Zeyang Yu · Min Xiang · Danilo Mandic -
2020 Spotlight: Reciprocal Adversarial Learning via Characteristic Functions »
Shengxi Li · Zeyang Yu · Min Xiang · Danilo Mandic -
2011 Poster: A Multilinear Subspace Regression Method Using Orthogonal Tensors Decompositions »
Qibin Zhao · Cesar F Caiafa · Danilo Mandic · Liqing Zhang · Tonio Ball · Andreas Schulze-bonhage · Andrzej S CICHOCKI -
2011 Spotlight: A Multilinear Subspace Regression Method Using Orthogonal Tensors Decompositions »
Qibin Zhao · Cesar F Caiafa · Danilo Mandic · Liqing Zhang · Tonio Ball · Andreas Schulze-bonhage · Andrzej S CICHOCKI