Timezone: »
Foundations of Attention Mechanisms in Deep Neural Network Architectures
Pierre Baldi · Roman Vershynin
We consider the foundations of attention mechanisms in deep neural network architectures and present three main results. First, we provide a systematic taxonomy of all possible attention mechanisms within, or as extensions of, the McCulloch and Pitt standard model into 18 classes depending on the origin type of the attention signal, the target type of the attention signal, and whether the interaction type is additive or multiplicative. Second, using this taxonomy, we identify three key attention mechanisms: output gating, synaptic gating, and multiplexing. Output gating and synaptic gating are extensions of the standard model and all current attention-based architectures, including transformers, use either output gating or synaptic gating, or a combination of both. Third, we develop a theory of attention capacity and derive mathematical results about the capacity of basic attention networks. For example, the output gating of a linear threshold gate of $n$ variables by another linear threshold gate of the same $n$ variables has capacity $2n^2 (1+o(1))$. Perhaps surprisingly, multiplexing attention is used in the proofs of these results. Synaptic and output gating provide computationally efficient extensions of the standard model allowing for {\it sparse} quadratic activation functions. They can also be viewed as primitives enabling the concise collapsing of multiple layers of processing in the standard model.
Author Information
Pierre Baldi (UC Irvine)
Roman Vershynin (UCI)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : Foundations of Attention Mechanisms in Deep Neural Network Architectures »
Fri. Dec 2nd 04:25 -- 04:35 PM Room
More from the Same Authors
-
2021 : Deep learning reconstruction of the neutrino energy with a shallow Askaryan detector »
Stephen McAleer · Christian Glaser · Pierre Baldi -
2021 : G-SpaNet: Generalized Permutationless Set Assignment for Particle Physics using Symmetry Preserving Attention »
Alexander Shmakov · Shih-chieh Hsu · Pierre Baldi -
2022 : Geometry-aware Autoregressive Models for Calorimeter Shower Simulations »
Junze Liu · Aishik Ghosh · Dylan Smith · Pierre Baldi · Daniel Whiteson -
2022 : Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments »
JB Lanier · Stephen McAleer · Pierre Baldi · Roy Fox -
2023 Poster: End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics »
Alexander Shmakov · Kevin Greif · Michael Fenton · Aishik Ghosh · Pierre Baldi · Daniel Whiteson -
2023 Poster: Language Models can Solve Computer Tasks »
Geunwoo Kim · Pierre Baldi · Stephen McAleer -
2023 Poster: AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning »
Mohammadamin Tavakoli · Pierre Baldi · Ann Marie Carlton · Yin Ting Chiu · Alexander Shmakov · David Van Vranken -
2023 Poster: ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate models »
Sungduk Yu · Walter Hannah · Liran Peng · Jerry Lin · Mohamed Aziz Bhouri · Ritwik Gupta · Björn Lütjens · Justus Will · Gunnar Behrens · Nora Loose · Charles Stern · Tom Beucler · Bryce Harrop · Benjamin Hillman · Andrea Jenney · Savannah Ferretti · Nana Liu · Animashree Anandkumar · Noah Brenowitz · Veronika Eyring · Nicholas Geneva · Pierre Gentine · Stephan Mandt · Jaideep Pathak · Akshay Subramaniam · Carl Vondrick · Rose Yu · Laure Zanna · Ryan Abernathey · Fiaz Ahmed · David Bader · Pierre Baldi · Elizabeth Barnes · Christopher Bretherton · Julius Busecke · Peter Caldwell · Wayne Chuang · Yilun Han · YU HUANG · Fernando Iglesias-Suarez · Sanket Jantre · Karthik Kashinath · Marat Khairoutdinov · Thorsten Kurth · Nicholas Lutsko · Po-Lun Ma · Griffin Mooers · J. David Neelin · David Randall · Sara Shamekh · Mark Taylor · Nathan Urban · Janni Yuval · Guang Zhang · Tian Zheng · Mike Pritchard -
2023 Oral: ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate models »
Sungduk Yu · Walter Hannah · Liran Peng · Jerry Lin · Mohamed Aziz Bhouri · Ritwik Gupta · Björn Lütjens · Justus Will · Gunnar Behrens · Nora Loose · Charles Stern · Tom Beucler · Bryce Harrop · Benjamin Hillman · Andrea Jenney · Savannah Ferretti · Nana Liu · Animashree Anandkumar · Noah Brenowitz · Veronika Eyring · Nicholas Geneva · Pierre Gentine · Stephan Mandt · Jaideep Pathak · Akshay Subramaniam · Carl Vondrick · Rose Yu · Laure Zanna · Ryan Abernathey · Fiaz Ahmed · David Bader · Pierre Baldi · Elizabeth Barnes · Christopher Bretherton · Julius Busecke · Peter Caldwell · Wayne Chuang · Yilun Han · YU HUANG · Fernando Iglesias-Suarez · Sanket Jantre · Karthik Kashinath · Marat Khairoutdinov · Thorsten Kurth · Nicholas Lutsko · Po-Lun Ma · Griffin Mooers · J. David Neelin · David Randall · Sara Shamekh · Mark Taylor · Nathan Urban · Janni Yuval · Guang Zhang · Tian Zheng · Mike Pritchard -
2021 Poster: XDO: A Double Oracle Algorithm for Extensive-Form Games »
Stephen McAleer · JB Lanier · Kevin A Wang · Pierre Baldi · Roy Fox -
2020 Poster: Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games »
Stephen McAleer · JB Lanier · Roy Fox · Pierre Baldi -
2019 Poster: Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes »
Lingge Li · Dustin Pluta · Babak Shahbaba · Norbert Fortin · Hernando Ombao · Pierre Baldi -
2018 Poster: On Neuronal Capacity »
Pierre Baldi · Roman Vershynin -
2018 Oral: On Neuronal Capacity »
Pierre Baldi · Roman Vershynin -
2017 : Poster session »
Abbas Zaidi · Christoph Kurz · David Heckerman · YiJyun Lin · Stefan Riezler · Ilya Shpitser · Songbai Yan · Olivier Goudet · Yash Deshpande · Judea Pearl · Jovana Mitrovic · Brian Vegetabile · Tae Hwy Lee · Karen Sachs · Karthika Mohan · Reagan Rose · Julius Ramakers · Negar Hassanpour · Pierre Baldi · Razieh Nabi · Noah Hammarlund · Eli Sherman · Carolin Lawrence · Fattaneh Jabbari · Vira Semenova · Maria Dimakopoulou · Pratik Gajane · Russell Greiner · Ilias Zadik · Alexander Blocker · Hao Xu · Tal EL HAY · Tony Jebara · Benoit Rostykus -
2014 Workshop: High-energy particle physics, machine learning, and the HiggsML data challenge (HEPML) »
Glen Cowan · Balázs Kégl · Kyle Cranmer · Gábor Melis · Tim Salimans · Vladimir Vava Gligorov · Daniel Whiteson · Lester Mackey · Wojciech Kotlowski · Roberto Díaz Morales · Pierre Baldi · Cecile Germain · David Rousseau · Isabelle Guyon · Tianqi Chen -
2014 Poster: Searching for Higgs Boson Decay Modes with Deep Learning »
Peter Sadowski · Daniel Whiteson · Pierre Baldi -
2014 Spotlight: Searching for Higgs Boson Decay Modes with Deep Learning »
Peter Sadowski · Daniel Whiteson · Pierre Baldi -
2013 Poster: Understanding Dropout »
Pierre Baldi · Peter Sadowski -
2013 Oral: Understanding Dropout »
Pierre Baldi · Peter Sadowski -
2012 Poster: Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction »
Pietro Di Lena · Pierre Baldi · Ken Nagata -
2012 Spotlight: Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction »
Pietro Di Lena · Pierre Baldi · Ken Nagata -
2011 Poster: A Machine Learning Approach to Predict Chemical Reactions »
Matthew A Kayala · Pierre Baldi -
2010 Workshop: Charting Chemical Space: Challenges and Opportunities for AI and Machine Learning »
Pierre Baldi · Klaus-Robert Müller · Gisbert Schneider -
2007 Poster: Mining Internet-Scale Software Repositories »
Erik Linstead · Paul Rigor · sushil bajracharya · cristina lopes · Pierre Baldi -
2006 Poster: A Scalable Machine Learning Approach to Go »
Lin Wu · Pierre Baldi -
2006 Talk: A Scalable Machine Learning Approach to Go »
Lin Wu · Pierre Baldi