Timezone: »
Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.) are also not well understood. In this work, we investigate how encoder-decoder networks solve different sequence-to-sequence tasks. We introduce a way of decomposing hidden states over a sequence into temporal (independent of input) and input-driven (independent of sequence position) components. This reveals how attention matrices are formed: depending on the task requirements, networks rely more heavily on either the temporal or input-driven components. These findings hold across both recurrent and feed-forward architectures despite their differences in forming the temporal components. Overall, our results provide new insight into the inner workings of attention-based encoder-decoder networks.
Author Information
Kyle Aitken (Allen Institute)
Vinay Ramasesh (Google)
Yuan Cao (Google Brain)
Niru Maheswaranathan (Meta Platforms, Inc.)
More from the Same Authors
-
2021 : Efficient and Private Federated Learning with Partially Trainable Networks »
Hakim Sidahmed · Zheng Xu · Yuan Cao -
2022 : REACT: Synergizing Reasoning and Acting in Language Models »
Shunyu Yao · Jeffrey Zhao · Dian Yu · Izhak Shafran · Karthik Narasimhan · Yuan Cao -
2023 Poster: Binarized Neural Machine Translation »
Yichi Zhang · Ankush Garg · Yuan Cao · Lukasz Lew · Behrooz Ghorbani · Zhiru Zhang · Orhan Firat -
2023 Poster: Grammar Prompting for Domain-Specific Language Generation with Large Language Models »
Bailin Wang · Zi Wang · Xuezhi Wang · Yuan Cao · Rif A. Saurous · Yoon Kim -
2023 Poster: Tree of Thoughts: Deliberate Problem Solving with Large Language Models »
Shunyu Yao · Dian Yu · Jeffrey Zhao · Izhak Shafran · Tom Griffiths · Yuan Cao · Karthik Narasimhan -
2023 Oral: Tree of Thoughts: Deliberate Problem Solving with Large Language Models »
Shunyu Yao · Dian Yu · Jeffrey Zhao · Izhak Shafran · Tom Griffiths · Yuan Cao · Karthik Narasimhan -
2022 Poster: Exploring Length Generalization in Large Language Models »
Cem Anil · Yuhuai Wu · Anders Andreassen · Aitor Lewkowycz · Vedant Misra · Vinay Ramasesh · Ambrose Slone · Guy Gur-Ari · Ethan Dyer · Behnam Neyshabur -
2022 Poster: Solving Quantitative Reasoning Problems with Language Models »
Aitor Lewkowycz · Anders Andreassen · David Dohan · Ethan Dyer · Henryk Michalewski · Vinay Ramasesh · Ambrose Slone · Cem Anil · Imanol Schlag · Theo Gutman-Solo · Yuhuai Wu · Behnam Neyshabur · Guy Gur-Ari · Vedant Misra -
2021 : Contributed Talk 5: Efficient and Private Federated Learning with Partially Trainable Networks »
Hakim Sidahmed · Zheng Xu · Yuan Cao -
2021 Poster: Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 : Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 Poster: Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling »
Tong Che · Ruixiang ZHANG · Jascha Sohl-Dickstein · Hugo Larochelle · Liam Paull · Yuan Cao · Yoshua Bengio -
2019 Poster: Universality and individuality in neural dynamics across large populations of recurrent networks »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2019 Spotlight: Universality and individuality in neural dynamics across large populations of recurrent networks »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2019 Poster: From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction »
Hidenori Tanaka · Aran Nayebi · Niru Maheswaranathan · Lane McIntosh · Stephen Baccus · Surya Ganguli -
2019 Poster: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo