Rico Sennrich, Barry Haddow, Alexandra Birch
We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English<->Czech, English<->German, English<->Romanian and English<->Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with using automatic back-translations of the monolingual News corpus as additional training data, pervasive dropout, and target-bidirectional models. All reported methods give substantial improvements, and we see improvements of 4.3--11.2 BLEU over our baseline systems. In the human evaluation, our systems were the (tied) best constrained system for 7 out of 8 translation directions in which we participated.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Machine Translation | WMT2016 English-Czech | BLEU score | 25.8 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 Romanian-English | BLEU score | 33.3 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 English-Russian | BLEU score | 26 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 English-German | BLEU score | 34.2 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 German-English | BLEU score | 38.6 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 Russian-English | BLEU score | 28 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 Czech-English | BLEU score | 31.4 | Attentional encoder-decoder + BPE |
| Machine Translation | WMT2016 English-Romanian | BLEU score | 28.1 | BiGRU |