Timezone: »
The recent success of Transformer has benefited many real-world applications, with its capability of building long dependency through pairwise dot-products. However, the strong assumption that elements are directly attentive to each other limits the performance of tasks with high-order dependencies such as natural language understanding and Image captioning. To solve such problems, we are the first to define the Jump Self-attention (JAT) to build Transformers. Inspired by the pieces moving of English Draughts, we introduce the spectral convolutional technique to calculate JAT on the dot-product feature map. This technique allows JAT's propagation in each self-attention head and is interchangeable with the canonical self-attention. We further develop the higher-order variants under the multi-hop assumption to increase the generality. Moreover, the proposed architecture is compatible with the pre-trained models. With extensive experiments, we empirically show that our methods significantly increase the performance on ten different tasks.
Author Information
Haoyi Zhou (Beihang University)
Siyang Xiao (Beijing University of Aeronautics and Astronautics)
Shanghang Zhang (UC Berkeley)
Jieqi Peng (Beihang University)
Shuai Zhang (Beihang University)
Jianxin Li (Beihang University)
More from the Same Authors
-
2022 Poster: Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation »
Yixiong Zou · Shanghang Zhang · Yuhua Li · Ruixuan Li -
2022 Poster: Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models »
Xiuying Wei · Yunchen Zhang · Xiangguo Zhang · Ruihao Gong · Shanghang Zhang · Qi Zhang · Fengwei Yu · Xianglong Liu -
2023 Poster: PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection »
Qiang Zhou · Weize Li · Lihan Jiang · Guoliang Wang · Guyue Zhou · Shanghang Zhang · Hao Zhao -
2023 Poster: Environment-Aware Dynamic Graph Learning for Out-of-Distribution Generalization »
Haonan Yuan · Qingyun Sun · Xingcheng Fu · Ziwei Zhang · Cheng Ji · Hao Peng · Jianxin Li -
2023 Poster: Does Graph Distillation See Like Vision Dataset Counterpart? »
Beining Yang · Kai Wang · Qingyun Sun · Cheng Ji · Xingcheng Fu · Hao Tang · Yang You · Jianxin Li -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models »
Xiuying Wei · Yunchen Zhang · Xiangguo Zhang · Ruihao Gong · Shanghang Zhang · Qi Zhang · Fengwei Yu · Xianglong Liu -
2022 Spotlight: Lightning Talks 2B-2 »
Chenjian Gao · Rui Ding · Lingzhi LI · Fan Yang · Xingting Yao · Jianxin Li · Bing Su · Zhen Shen · Tongda Xu · Shuai Zhang · Ji-Rong Wen · Lin Guo · Fanrong Li · Kehua Guo · Zhongshu Wang · Zhi Chen · Xiangyuan Zhu · Zitao Mo · Dailan He · Hui Xiong · Yan Wang · Zheng Wu · Wenbing Tao · Jian Cheng · Haoyi Zhou · Li Shen · Ping Tan · Liwei Wang · Hongwei Qin -
2022 Spotlight: AutoST: Towards the Universal Modeling of Spatio-temporal Sequences »
Jianxin Li · Shuai Zhang · Hui Xiong · Haoyi Zhou -
2022 Workshop: Human in the Loop Learning (HiLL) Workshop at NeurIPS 2022 »
Shanghang Zhang · Hao Dong · Wei Pan · Pradeep Ravikumar · Vittorio Ferrari · Fisher Yu · Xin Wang · Zihan Ding -
2022 Poster: AutoST: Towards the Universal Modeling of Spatio-temporal Sequences »
Jianxin Li · Shuai Zhang · Hui Xiong · Haoyi Zhou -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing