Timezone: »

Fast Transformers with Clustered Attention
Apoorv Vyas · Angelos Katharopoulos · François Fleuret

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1412

Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, groups queries into clusters and computes attention just for the centroids. To further improve this approximation, we use the computed clusters to identify the keys with the highest attention per query and compute the exact key/query dot products. This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters. We evaluate our approach on two automatic speech recognition datasets and show that our model consistently outperforms vanilla transformers for a given computational budget. Finally, we demonstrate that our model can approximate arbitrarily complex attention distributions with a minimal number of clusters by approximating a pretrained BERT model on GLUE and SQuAD benchmarks with only 25 clusters and no loss in performance.

Author Information

Apoorv Vyas (Idiap Research Institute and EPFL)
Angelos Katharopoulos (Idiap & EPFL)
François Fleuret (University of Geneva)

François Fleuret got a PhD in Mathematics from INRIA and the University of Paris VI in 2000, and an Habilitation degree in Mathematics from the University of Paris XIII in 2006. He is Full Professor in the department of Computer Science at the University of Geneva, and Adjunct Professor in the School of Engineering of the École Polytechnique Fédérale de Lausanne. He has published more than 80 papers in peer-reviewed international conferences and journals. He is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence, serves as Area Chair for NeurIPS, AAAI, and ICCV, and in the program committee of many top-tier international conferences in machine learning and computer vision. He was or is expert for multiple funding agencies. He is the inventor of several patents in the field of machine learning, and co-founder of Neural Concept SA, a company specializing in the development and commercialization of deep learning solutions for engineering design. His main research interest is machine learning, with a particular focus on computational aspects and sample efficiency.

More from the Same Authors