`

Timezone: »

 
Poster
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
Tan Nguyen · Vai Suliafu · Stanley Osher · Long Chen · Bao Wang

Tue Dec 07 04:30 PM -- 06:00 PM (PST) @ None #None
We propose FMMformers, a class of efficient and flexible transformers inspired by the celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM decomposes particle-particle interaction into near-field and far-field components and then performs direct and coarse-grained computation, respectively. Similarly, FMMformers decompose the attention into near-field and far-field attention, modeling the near-field attention by a banded matrix and the far-field attention by a low-rank matrix. Computing the attention matrix for FMMformers requires linear complexity in computational time and memory footprint with respect to the sequence length. In contrast, standard transformers suffer from quadratic complexity. We analyze and validate the advantage of FMMformers over the standard transformer on the Long Range Arena and language modeling benchmarks. FMMformers can even outperform the standard transformer in terms of accuracy by a significant margin. For instance, FMMformers achieve an average classification accuracy of $60.74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58.70\%$.

Author Information

Tan Nguyen (UCLA)

I am currently a postdoctoral scholar in the Department of Mathematics at the University of California, Los Angeles, working with Dr. Stanley J. Osher. I have obtained my Ph.D. in Machine Learning from Rice University, where I was advised by Dr. Richard G. Baraniuk. My research is focused on the intersection of Deep Learning, Probabilistic Modeling, Optimization, and ODEs/PDEs. I gave an invited talk in the Deep Learning Theory Workshop at NeurIPS 2018 and organized the 1st Workshop on Integration of Deep Neural Models and Differential Equations at ICLR 2020. I also had two awesome long internships with Amazon AI and NVIDIA Research, during which he worked with Dr. Anima Anandkumar. I am the recipient of the prestigious Computing Innovation Postdoctoral Fellowship (CIFellows) from the Computing Research Association (CRA), the NSF Graduate Research Fellowship, and the IGERT Neuroengineering Traineeship. I received his MSEE and BSEE from Rice in May 2018 and May 2014, respectively.

Vai Suliafu (University of Utah)
Stanley Osher (UCLA)
Long Chen (University of California, Irvine)
Bao Wang (University of Utah)

More from the Same Authors

  • 2021 : Stan Osher Talk »
    Stanley Osher
  • 2021 Poster: Heavy Ball Neural Ordinary Differential Equations »
    Hedi Xia · Vai Suliafu · Hangjie Ji · Tan Nguyen · Andrea Bertozzi · Stanley Osher · Bao Wang
  • 2020 Poster: MomentumRNN: Integrating Momentum into Recurrent Neural Networks »
    Tan Nguyen · Richard Baraniuk · Andrea Bertozzi · Stanley Osher · Bao Wang
  • 2020 Poster: Neural Networks with Recurrent Generative Feedback »
    Yujia Huang · James Gornet · Sihui Dai · Zhiding Yu · Tan Nguyen · Doris Tsao · Anima Anandkumar
  • 2019 : Poster Session »
    Pravish Sainath · Mohamed Akrout · Charles Delahunt · Nathan Kutz · Guangyu Robert Yang · Joseph Marino · L F Abbott · Nicolas Vecoven · Damien Ernst · andrew warrington · Michael Kagan · Kyunghyun Cho · Kameron Harris · Leopold Grinberg · John J. Hopfield · Dmitry Krotov · Taliah Muhammad · Erick Cobos · Edgar Walker · Jacob Reimer · Andreas Tolias · Alexander Ecker · Janaki Sheth · Yu Zhang · Maciej Wołczyk · Jacek Tabor · Szymon Maszke · Roman Pogodin · Dane Corneil · Wulfram Gerstner · Baihan Lin · Guillermo Cecchi · Jenna M Reinen · Irina Rish · Guillaume Bellec · Darjan Salaj · Anand Subramoney · Wolfgang Maass · Yueqi Wang · Ari Pakman · Jin Hyung Lee · Liam Paninski · Bryan Tripp · Colin Graber · Alex Schwing · Luke Prince · Gabriel Ocker · Michael Buice · Benjamin Lansdell · Konrad Kording · Jack Lindsey · Terrence Sejnowski · Matthew Farrell · Eric Shea-Brown · Nicolas Farrugia · Victor Nepveu · Jiwoong Im · Kristin Branson · Brian Hu · Ramakrishnan Iyer · Stefan Mihalas · Sneha Aenugu · Hananel Hazan · Sihui Dai · Tan Nguyen · Doris Tsao · Richard Baraniuk · Anima Anandkumar · Hidenori Tanaka · Aran Nayebi · Stephen Baccus · Surya Ganguli · Dean Pospisil · Eilif Muller · Jeffrey S Cheng · Gaël Varoquaux · Kamalaker Dadi · Dimitrios C Gklezakos · Rajesh PN Rao · Anand Louis · Christos Papadimitriou · Santosh Vempala · Naganand Yadati · Daniel Zdeblick · Daniela M Witten · Nicholas Roberts · Vinay Prabhu · Pierre Bellec · Poornima Ramesh · Jakob H Macke · Santiago Cadena · Guillaume Bellec · Franz Scherr · Owen Marschall · Robert Kim · Hannes Rapp · Marcio Fonseca · Oliver Armitage · Jiwoong Im · Thomas Hardcastle · Abhishek Sharma · Wyeth Bair · Adrian Valente · Shane Shang · Merav Stern · Rutuja Patil · Peter Wang · Sruthi Gorantla · Peter Stratton · Tristan Edwards · Jialin Lu · Martin Ester · Yurii Vlasov · Siavash Golkar
  • 2019 Poster: ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies »
    Bao Wang · Zuoqiang Shi · Stanley Osher
  • 2018 : Contributed Talk 3 »
    Tan Nguyen
  • 2018 Poster: Deep Neural Nets with Interpolating Function as Output Activation »
    Bao Wang · Xiyang Luo · Zhen Li · Wei Zhu · Zuoqiang Shi · Stanley Osher
  • 2016 Poster: A Probabilistic Framework for Deep Learning »
    Ankit Patel · Tan Nguyen · Richard Baraniuk