Timezone: »
Spotlight
Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
Subhabrata Dutta · Tanya Gautam · Soumen Chakrabarti · Tanmoy Chakraborty
@
The Transformer and its variants have been proven to be efficient sequence learners in many different domains. Despite their staggering success, a critical issue has been the enormous number of parameters that must be trained (ranging from $10^7$ to $10^{11}$) along with the quadratic complexity of dot-product attention. In this work, we investigate the problem of approximating the two central components of the Transformer --- multi-head self-attention and point-wise feed-forward transformation, with reduced parameter space and computational complexity. We build upon recent developments in analyzing deep neural networks as numerical solvers of ordinary differential equations. Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme, \name, to bypass costly dot-product attention over multiple stacked layers. We perform exhaustive experiments with \name\ on well-known encoder-decoder as well as encoder-only tasks. We observe that the degree of approximation (or inversely, the degree of parameter reduction) has different effects on the performance, depending on the task. While in the encoder-decoder regime, \name\ delivers performances comparable to the original Transformer, in encoder-only tasks it consistently outperforms Transformer along with several subsequent variants.
Author Information
Subhabrata Dutta (Jadavpur University)
Tanya Gautam (Indraprastha Institute of Information Technology Delhi)
Soumen Chakrabarti (Indian Institute of Technology Bombay)
Tanmoy Chakraborty (Indraprastha Institute of Information Technology Delhi)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems »
Thu. Dec 9th 08:30 -- 10:00 AM Room
More from the Same Authors
-
2023 Poster: Locality Sensitive Hashing in Fourier Frequency Domain For Soft Set Containment Search »
Indradyumna Roy · Rishi Agarwal · Soumen Chakrabarti · Anirban Dasgupta · Abir De -
2022 Spotlight: Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection »
Abir De · Soumen Chakrabarti -
2022 Spotlight: Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks »
Indradyumna Roy · Soumen Chakrabarti · Abir De -
2022 Poster: Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection »
Abir De · Soumen Chakrabarti -
2022 Poster: Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks »
Indradyumna Roy · Soumen Chakrabarti · Abir De -
2021 Poster: Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations »
Vihari Piratla · Soumen Chakrabarti · Sunita Sarawagi