Timezone: »

Attention is All you Need
Ashish Vaswani · Noam Shazeer · Niki Parmar · Jakob Uszkoreit · Llion Jones · Aidan Gomez · Łukasz Kaiser · Illia Polosukhin

Wed Dec 06 03:35 PM -- 03:40 PM (PST) @ Hall A

The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

Author Information

Ashish Vaswani (Google Brain)
Noam Shazeer (Google)
Niki Parmar (Google)
Jakob Uszkoreit (Google, Inc.)
Llion Jones (Google)
Aidan Gomez (University of Toronto)
Łukasz Kaiser (Google Brain)
Illia Polosukhin

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors