Poster
Compositional De-Attention Networks
Yi Tay · Anh Tuan Luu · Aston Zhang · Shuohang Wang · Siu Cheung Hui

Wed Dec 11th 05:00 -- 07:00 PM @ East Exhibition Hall B + C #127

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation. We obtain promising experimental results, achieving state-of-the-art performance on several tasks/datasets.

Author Information

Yi Tay (NTU, Singapore)
Anh Tuan Luu (MIT CSAIL)
Aston Zhang (Amazon AI)
Shuohang Wang (Singapore Management University)
Siu Cheung Hui (Nanyang Technological University)

More from the Same Authors