Timezone: »
Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences longer than seen at training time. Relative positions are more robust to input length change, but are more complex to implement and yield inferior model throughput due to extra computational and memory costs. In this paper, we propose an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative positional embeddings (better generalization). In addition, our empirical evaluation on state-of-the-art models in machine translation, image and speech recognition demonstrates that CAPE leads to better generalization performance as well as increased stability with respect to training hyper-parameters.
Author Information
Tatiana Likhomanenko (Apple)
Qiantong Xu (Facebook AI Research)
Gabriel Synnaeve (Facebook)
Ronan Collobert (Facebook AI Research)
Alex Rogozhnikov (Herophilus)
Developing ML for scientific applications, author of einops
More from the Same Authors
-
2022 : Continuous Soft Pseudo-Labeling in ASR »
Tatiana Likhomanenko · Ronan Collobert · Navdeep Jaitly · Samy Bengio -
2022 : Spotlight 4 -Tatiana Likhomanenko: Continuous Soft Pseudo-Labeling in ASR »
Tatiana Likhomanenko -
2022 Poster: Star Temporal Classification: Sequence Modeling with Partially Labeled Data »
Vineel Pratap · Awni Hannun · Gabriel Synnaeve · Ronan Collobert -
2021 Poster: Hierarchical Skills for Efficient Exploration »
Jonas Gehring · Gabriel Synnaeve · Andreas Krause · Nicolas Usunier -
2021 Poster: XCiT: Cross-Covariance Image Transformers »
Alaaeldin Ali · Hugo Touvron · Mathilde Caron · Piotr Bojanowski · Matthijs Douze · Armand Joulin · Ivan Laptev · Natalia Neverova · Gabriel Synnaeve · Jakob Verbeek · Herve Jegou -
2019 Poster: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning »
Nicolas Carion · Nicolas Usunier · Gabriel Synnaeve · Alessandro Lazaric -
2019 Spotlight: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning »
Nicolas Carion · Nicolas Usunier · Gabriel Synnaeve · Alessandro Lazaric -
2018 Poster: Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger »
Gabriel Synnaeve · Zeming Lin · Jonas Gehring · Dan Gant · Vegard Mella · Vasil Khalidov · Nicolas Carion · Nicolas Usunier