Convolutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

2017-05-08ICML 2017 8Machine Translation Image Classification Bangla Spelling Error Correction Translation

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code(official)Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2015 German-English	BLEU score	32.31	ConvS2S
Machine Translation	IWSLT2015 English-German	BLEU score	26.73	ConvS2S
Machine Translation	WMT2014 English-German	BLEU score	26.4	ConvS2S (ensemble)
Machine Translation	WMT2014 English-German	BLEU score	25.16	ConvS2S
Machine Translation	WMT2016 English-Romanian	BLEU score	29.9	ConvS2S BPE40k
Machine Translation	WMT2014 English-French	BLEU score	41.3	ConvS2S (ensemble)
Machine Translation	WMT2014 English-French	BLEU score	40.46	ConvS2S
Image Classification	MNIST	Accuracy	98.59	CNN Model by Som
Image Classification	MNIST	Percentage error	1.41	CNN Model by Som

Convolutional Sequence to Sequence Learning

Abstract

Results

Related Papers

Convolutional Sequence to Sequence Learning

Abstract

Results

Related Papers