logo
Loading...
Attention Is All You Need
Computer Science31st Conference on Neural Information Processing Systems (NIPS 2017)

Attention Is All You Need

A. Vaswani, N. Shazeer, et al.

Discover a groundbreaking approach in machine translation with the Transformer, a model developed by Ashish Vaswani, Noam Shazeer, and others, that eliminates the need for recurrence and convolutions. Achieving state-of-the-art results on English-to-German and English-to-French translation tasks, this innovative architecture is highly efficient and significantly faster to train. Join us in exploring the future of sequence transduction!... show more
Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Publisher
31st Conference on Neural Information Processing Systems (NIPS 2017)
Published On
Dec 01, 2017
Authors
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Illia Polosukhin, Lukasz Kaiser
Tags
transformermachine translationattention mechanismBLEU scoreparallelizationsequence transduction
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny