Timezone: »
Foundations of Attention Mechanisms in Deep Neural Network Architectures
Pierre Baldi · Roman Vershynin
Fri Dec 02 08:25 AM -- 08:35 AM (PST) @
Event URL: https://app.sli.do/event/bayr24RBpGdcveCqzPfdR6 »
We consider the foundations of attention mechanisms in deep neural network architectures and present three main results. First, we provide a systematic taxonomy of all possible attention mechanisms within, or as extensions of, the McCulloch and Pitt standard model into 18 classes depending on the origin type of the attention signal, the target type of the attention signal, and whether the interaction type is additive or multiplicative. Second, using this taxonomy, we identify three key attention mechanisms: output gating, synaptic gating, and multiplexing. Output gating and synaptic gating are extensions of the standard model and all current attention-based architectures, including transformers, use either output gating or synaptic gating, or a combination of both. Third, we develop a theory of attention capacity and derive mathematical results about the capacity of basic attention networks. For example, the output gating of a linear threshold gate of $n$ variables by another linear threshold gate of the same $n$ variables has capacity $2n^2 (1+o(1))$. Perhaps surprisingly, multiplexing attention is used in the proofs of these results. Synaptic and output gating provide computationally efficient extensions of the standard model allowing for {\it sparse} quadratic activation functions. They can also be viewed as primitives enabling the concise collapsing of multiple layers of processing in the standard model.
We consider the foundations of attention mechanisms in deep neural network architectures and present three main results. First, we provide a systematic taxonomy of all possible attention mechanisms within, or as extensions of, the McCulloch and Pitt standard model into 18 classes depending on the origin type of the attention signal, the target type of the attention signal, and whether the interaction type is additive or multiplicative. Second, using this taxonomy, we identify three key attention mechanisms: output gating, synaptic gating, and multiplexing. Output gating and synaptic gating are extensions of the standard model and all current attention-based architectures, including transformers, use either output gating or synaptic gating, or a combination of both. Third, we develop a theory of attention capacity and derive mathematical results about the capacity of basic attention networks. For example, the output gating of a linear threshold gate of $n$ variables by another linear threshold gate of the same $n$ variables has capacity $2n^2 (1+o(1))$. Perhaps surprisingly, multiplexing attention is used in the proofs of these results. Synaptic and output gating provide computationally efficient extensions of the standard model allowing for {\it sparse} quadratic activation functions. They can also be viewed as primitives enabling the concise collapsing of multiple layers of processing in the standard model.
Author Information
Pierre Baldi (UC Irvine)
Roman Vershynin (UCI)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : Foundations of Attention Mechanisms in Deep Neural Network Architectures »
Dates n/a. Room
More from the Same Authors
-
2021 : Deep learning reconstruction of the neutrino energy with a shallow Askaryan detector »
Stephen McAleer · Christian Glaser · Pierre Baldi -
2021 : G-SpaNet: Generalized Permutationless Set Assignment for Particle Physics using Symmetry Preserving Attention »
Alexander Shmakov · Shih-chieh Hsu · Pierre Baldi -
2022 : Geometry-aware Autoregressive Models for Calorimeter Shower Simulations »
Junze Liu · Aishik Ghosh · Dylan Smith · Pierre Baldi · Daniel Whiteson -
2022 : Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments »
JB Lanier · Stephen McAleer · Pierre Baldi · Roy Fox -
2021 Poster: XDO: A Double Oracle Algorithm for Extensive-Form Games »
Stephen McAleer · JB Lanier · Kevin A Wang · Pierre Baldi · Roy Fox -
2020 Poster: Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games »
Stephen McAleer · JB Lanier · Roy Fox · Pierre Baldi -
2019 Poster: Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes »
Lingge Li · Dustin Pluta · Babak Shahbaba · Norbert Fortin · Hernando Ombao · Pierre Baldi -
2018 Poster: On Neuronal Capacity »
Pierre Baldi · Roman Vershynin -
2018 Oral: On Neuronal Capacity »
Pierre Baldi · Roman Vershynin -
2017 : Poster session »
Abbas Zaidi · Christoph Kurz · David Heckerman · YiJyun Lin · Stefan Riezler · Ilya Shpitser · Songbai Yan · Olivier Goudet · Yash Deshpande · Judea Pearl · Jovana Mitrovic · Brian Vegetabile · Tae Hwy Lee · Karen Sachs · Karthika Mohan · Reagan Rose · Julius Ramakers · Negar Hassanpour · Pierre Baldi · Razieh Nabi · Noah Hammarlund · Eli Sherman · Carolin Lawrence · Fattaneh Jabbari · Vira Semenova · Maria Dimakopoulou · Pratik Gajane · Russell Greiner · Ilias Zadik · Alexander Blocker · Hao Xu · Tal EL HAY · Tony Jebara · Benoit Rostykus -
2014 Workshop: High-energy particle physics, machine learning, and the HiggsML data challenge (HEPML) »
Glen Cowan · Balázs Kégl · Kyle Cranmer · Gábor Melis · Tim Salimans · Vladimir Vava Gligorov · Daniel Whiteson · Lester Mackey · Wojciech Kotlowski · Roberto Díaz Morales · Pierre Baldi · Cecile Germain · David Rousseau · Isabelle Guyon · Tianqi Chen -
2014 Poster: Searching for Higgs Boson Decay Modes with Deep Learning »
Peter Sadowski · Daniel Whiteson · Pierre Baldi -
2014 Spotlight: Searching for Higgs Boson Decay Modes with Deep Learning »
Peter Sadowski · Daniel Whiteson · Pierre Baldi -
2013 Poster: Understanding Dropout »
Pierre Baldi · Peter Sadowski -
2013 Oral: Understanding Dropout »
Pierre Baldi · Peter Sadowski -
2012 Poster: Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction »
Pietro Di Lena · Pierre Baldi · Ken Nagata -
2012 Spotlight: Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction »
Pietro Di Lena · Pierre Baldi · Ken Nagata -
2011 Poster: A Machine Learning Approach to Predict Chemical Reactions »
Matthew A Kayala · Pierre Baldi -
2010 Workshop: Charting Chemical Space: Challenges and Opportunities for AI and Machine Learning »
Pierre Baldi · Klaus-Robert Müller · Gisbert Schneider -
2007 Poster: Mining Internet-Scale Software Repositories »
Erik Linstead · Paul Rigor, Ph.D. · sushil bajracharya · cristina lopes · Pierre Baldi -
2006 Poster: A Scalable Machine Learning Approach to Go »
Lin Wu · Pierre Baldi -
2006 Talk: A Scalable Machine Learning Approach to Go »
Lin Wu · Pierre Baldi