Metric: BLEU (higher is better)
| # | Model↕ | BLEU▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BERT-fused NMT | 38.27 | No | Incorporating BERT into Neural Machine Translation | 2020-02-17 | Code |
| 2 | MASS (6-layer Transformer) | 37.5 | No | MASS: Masked Sequence to Sequence Pre-training f... | 2019-05-07 | Code |
| 3 | SMT + NMT (tuning and joint refinement) | 36.2 | No | An Effective Approach to Unsupervised Machine Tr... | 2019-02-04 | Code |
| 4 | MLM pretraining for encoder and decoder | 33.4 | No | Cross-lingual Language Model Pretraining | 2019-01-22 | Code |
| 5 | GPT-3 175B (Few-Shot) | 32.6 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 6 | SMT as posterior regularization | 29.5 | No | Unsupervised Neural Machine Translation with SMT... | 2019-01-14 | Code |
| 7 | PBSMT + NMT | 27.6 | No | Phrase-Based & Neural Unsupervised Machine Trans... | 2018-04-20 | Code |