Metric: SacreBLEU (higher is better)
| # | Model↕ | SacreBLEU▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Noisy back-translation | 33.8 | Yes | Understanding Back-Translation at Scale | 2018-08-28 | Code |
| 2 | Transformer Cycle (Rev) | 33.54 | No | Lessons on Parameter Sharing across Layers in Tr... | 2021-04-13 | Code |
| 3 | Transformer+Rep(Uni) | 32.35 | No | Rethinking Perturbations in Encoder-Decoders for... | 2021-04-05 | Code |
| 4 | MAT | 29.9 | No | Multi-branch Attentive Transformer | 2020-06-18 | Code |
| 5 | Transformer (ADMIN init) | 29.5 | No | Very Deep Transformers for Neural Machine Transl... | 2020-08-18 | Code |
| 6 | Evolved Transformer Big | 29.2 | No | The Evolved Transformer | 2019-01-30 | Code |
| 7 | Mega | 27.96 | No | Mega: Moving Average Equipped Gated Attention | 2022-09-21 | Code |