Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Machine Translation
/
WMT2014 English-German
Machine Translation on WMT2014 English-German
Metric: BLEU score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
BLEU score (best first)
BLEU score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
BLEU score
▼
Extra Data
Paper
Date
↕
Code
1
Transformer Cycle (Rev)
35.14
No
Lessons on Parameter Sharing across Layers in Tr...
2021-04-13
Code
2
Noisy back-translation
35
Yes
Understanding Back-Translation at Scale
2018-08-28
Code
3
Transformer+Rep(Uni)
33.89
No
Rethinking Perturbations in Encoder-Decoders for...
2021-04-05
Code
4
T5-11B
32.1
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
5
BiBERT
31.26
No
BERT, mBERT, or BiBERT? A Study on Contextualize...
2021-09-09
Code
6
Transformer + R-Drop
30.91
No
R-Drop: Regularized Dropout for Neural Networks
2021-06-28
Code
7
Bi-SimCut
30.78
No
Bi-SimCut: A Simple Strategy for Boosting Neural...
2022-06-06
Code
8
BERT-fused NMT
30.75
No
Incorporating BERT into Neural Machine Translation
2020-02-17
Code
9
Data Diversification - Transformer
30.7
No
Data Diversification: A Simple Strategy For Neur...
2019-11-05
Code
10
SimCut
30.56
No
Bi-SimCut: A Simple Strategy for Boosting Neural...
2022-06-06
Code
11
Mask Attention Network (big)
30.4
No
Mask Attention Networks: Rethinking and Strength...
2021-03-25
Code
12
Transformer (ADMIN init)
30.1
No
Very Deep Transformers for Neural Machine Transl...
2020-08-18
Code
13
PowerNorm (Transformer)
30.1
No
PowerNorm: Rethinking Batch Normalization in Tra...
2020-03-17
Code
14
Depth Growing
30.07
No
Depth Growing for Neural Machine Translation
2019-07-03
Code
15
MUSE(Parallel Multi-scale Attention)
29.9
No
MUSE: Parallel Multi-Scale Attention for Sequenc...
2019-11-17
Code
16
Evolved Transformer Big
29.8
No
The Evolved Transformer
2019-01-30
Code
17
OmniNetP
29.8
No
OmniNet: Omnidirectional Representations from Tr...
2021-03-01
Code
18
DynamicConv
29.7
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
19
Local Joint Self-attention
29.7
No
Joint Source-Target Self Attention with Locality...
2019-05-16
Code
20
TaLK Convolutions
29.6
No
Time-aware Large Kernel Convolutions
2020-02-08
Code
21
Transformer Big + MoS
29.6
No
Fast and Simple Mixture of Softmaxes with BPE an...
2018-09-25
Code
22
AdvAug (aut+adv)
29.57
No
AdvAug: Robust Adversarial Augmentation for Neur...
2020-06-21
-
23
PartialFormer
29.56
No
PartialFormer: Modeling Part Instead of Whole fo...
2023-10-23
Code
24
Transformer Big + adversarial MLE
29.52
No
Improving Neural Language Modeling via Adversari...
2019-06-10
Code
25
Transformer Big
29.3
No
Scaling Neural Machine Translation
2018-06-01
Code
26
Subformer-xlarge
29.3
No
-
-
-
27
SB-NMT
29.21
No
Synchronous Bidirectional Neural Machine Transla...
2019-05-13
Code
28
Transformer (big) + Relative Position Representations
29.2
No
Self-Attention with Relative Position Representa...
2018-03-06
Code
29
FLOATER-large
29.2
No
Learning to Encode Position for Transformer with...
2020-03-13
Code
30
Local Transformer
29.2
No
Modeling Localness for Self-Attention Networks
2018-10-24
-
31
Transformer Big with FRAGE
29.11
No
FRAGE: Frequency-Agnostic Word Representation
2018-09-18
Code
32
Mask Attention Network (base)
29.1
No
Mask Attention Networks: Rethinking and Strength...
2021-03-25
Code
33
Mega
29.01
No
Mega: Moving Average Equipped Gated Attention
2022-09-21
Code
34
adequacy-oriented NMT
28.99
No
Neural Machine Translation with Adequacy-Oriente...
2018-11-21
-
35
LightConv
28.9
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
36
Weighted Transformer (large)
28.9
No
Weighted Transformer Network for Machine Transla...
2017-11-06
Code
37
universal transformer base
28.9
No
Universal Transformers
2018-07-10
Code
38
KERMIT
28.7
No
KERMIT: Generative Insertion-Based Modeling for ...
2019-06-04
-
39
T2R + Pretrain
28.7
No
Finetuning Pretrained Transformers into RNNs
2021-03-24
Code
40
AdvAug (aut)
28.58
No
AdvAug: Robust Adversarial Augmentation for Neur...
2020-06-21
-
41
RNMT+
28.5
No
The Best of Both Worlds: Combining Recent Advanc...
2018-04-26
Code
42
Synthesizer (Random + Vanilla)
28.47
No
Synthesizer: Rethinking Self-Attention in Transf...
2020-05-02
Code
43
Hardware Aware Transformer
28.4
No
HAT: Hardware-Aware Transformers for Efficient N...
2020-05-28
Code
44
Transformer Big
28.4
No
Attention Is All You Need
2017-06-12
Code
45
Transformer + SRU
28.4
No
Simple Recurrent Units for Highly Parallelizable...
2017-09-08
Code
46
Evolved Transformer Base
28.4
No
The Evolved Transformer
2019-01-30
Code
47
Rfa-Gate-arccos
28.2
No
Random Feature Attention
2021-03-03
-
48
Transformer-DRILL Base
28.1
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
49
AdvAug (mixup)
28.08
No
AdvAug: Robust Adversarial Augmentation for Neur...
2020-06-21
-
50
CMLM+LAT+4 iterations
27.35
No
Incorporating a Local Translation Mechanism into...
2020-11-12
Code
51
Transformer Base
27.3
No
Attention Is All You Need
2017-06-12
Code
52
Levenshtein Transformer (distillation)
27.27
No
Levenshtein Transformer
2019-05-27
Code
53
DisCo + Mask-Predict (non-autoregressive)
27.06
No
-
-
Code
54
Adaptively Sparse Transformer (alpha-entmax)
26.93
No
Adaptively Sparse Transformers
2019-08-30
Code
55
ResMLP-12
26.8
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
56
CNAT
26.6
No
Non-Autoregressive Translation by Learning Targe...
2021-03-21
Code
57
Lite Transformer
26.5
No
Lite Transformer with Long-Short Range Attention
2020-04-24
Code
58
ConvS2S (ensemble)
26.4
No
Convolutional Sequence to Sequence Learning
2017-05-08
Code
59
ResMLP-6
26.4
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
60
Average Attention Network
26.31
No
Accelerating Neural Transformer via an Average A...
2018-05-02
Code
61
GNMT+RL
26.3
No
Google's Neural Machine Translation System: Brid...
2016-09-26
Code
62
SliceNet
26.1
No
Depthwise Separable Convolutions for Neural Mach...
2017-06-09
Code
63
Average Attention Network (w/o FFN)
26.05
No
Accelerating Neural Transformer via an Average A...
2018-05-02
Code
64
MoE
26.03
No
Outrageously Large Neural Networks: The Sparsely...
2017-01-23
Code
65
Average Attention Network (w/o gate)
25.91
No
Accelerating Neural Transformer via an Average A...
2018-05-02
Code
66
Adaptively Sparse Transformer (1.5-entmax)
25.89
No
Adaptively Sparse Transformers
2019-08-30
Code
67
DenseNMT
25.52
No
Dense Information Flow for Neural Machine Transl...
2018-06-03
Code
68
GLAT
25.21
No
Glancing Transformer for Non-Autoregressive Neur...
2020-08-18
Code
69
CMLM+LAT+1 iterations
25.2
No
Incorporating a Local Translation Mechanism into...
2020-11-12
Code
70
ConvS2S
25.16
No
Convolutional Sequence to Sequence Learning
2017-05-08
Code
71
ByteNet
23.75
No
Neural Machine Translation in Linear Time
2016-10-31
Code
72
FlowSeq-large (NPD n = 30)
23.64
No
FlowSeq: Non-Autoregressive Conditional Sequence...
2019-09-05
Code
73
FlowSeq-large (NPD n = 15)
23.14
No
FlowSeq: Non-Autoregressive Conditional Sequence...
2019-09-05
Code
74
FlowSeq-large (IWD n = 15)
22.94
No
FlowSeq: Non-Autoregressive Conditional Sequence...
2019-09-05
Code
75
Denoising autoencoders (non-autoregressive)
21.54
No
Deterministic Non-Autoregressive Neural Sequence...
2018-02-19
Code
76
RNN Enc-Dec Att
20.9
No
Effective Approaches to Attention-based Neural M...
2015-08-17
Code
77
FlowSeq-large
20.85
No
FlowSeq: Non-Autoregressive Conditional Sequence...
2019-09-05
Code
78
PBMT
20.7
No
-
-
-
79
Deep-Att
20.7
No
Deep Recurrent Models with Fast-Forward Connecti...
2016-06-14
Code
80
Phrase Based MT
20.7
No
-
-
-
81
PBSMT + NMT
20.23
No
Phrase-Based & Neural Unsupervised Machine Trans...
2018-04-20
Code
82
NAT +FT + NPD
19.17
No
Non-Autoregressive Neural Machine Translation
2017-11-07
Code
83
FlowSeq-base
18.55
No
FlowSeq: Non-Autoregressive Conditional Sequence...
2019-09-05
Code
84
Seq-KD + Seq-Inter + Word-KD
18.5
No
Sequence-Level Knowledge Distillation
2016-06-25
Code
85
Unsupervised PBSMT
17.94
No
Phrase-Based & Neural Unsupervised Machine Trans...
2018-04-20
Code
86
NSE-NSE
17.9
No
Neural Semantic Encoders
2016-07-14
Code
87
Unsupervised NMT + Transformer
17.16
No
Phrase-Based & Neural Unsupervised Machine Trans...
2018-04-20
Code
88
SMT + iterative backtranslation (unsupervised)
14.08
No
Unsupervised Statistical Machine Translation
2018-09-04
Code
89
Reverse RNN Enc-Dec
14
No
Effective Approaches to Attention-based Neural M...
2015-08-17
Code
90
RNN Enc-Dec
11.3
No
Effective Approaches to Attention-based Neural M...
2015-08-17
Code
#1
Transformer Cycle (Rev)
SOTA
35.14
BLEU score
· 2021-04-13
Lessons on Parameter Sharing across Layers in Transformers
Code
#2
Noisy back-translation
SOTA
35
BLEU score
· Extra Data
· 2018-08-28
Understanding Back-Translation at Scale
Code
#3
Transformer+Rep(Uni)
33.89
BLEU score
· 2021-04-05
Rethinking Perturbations in Encoder-Decoders for Fast Training
Code
#4
T5-11B
32.1
BLEU score
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#5
BiBERT
31.26
BLEU score
· 2021-09-09
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Code
#6
Transformer + R-Drop
30.91
BLEU score
· 2021-06-28
R-Drop: Regularized Dropout for Neural Networks
Code
#7
Bi-SimCut
30.78
BLEU score
· 2022-06-06
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Code
#8
BERT-fused NMT
30.75
BLEU score
· 2020-02-17
Incorporating BERT into Neural Machine Translation
Code
#9
Data Diversification - Transformer
30.7
BLEU score
· 2019-11-05
Data Diversification: A Simple Strategy For Neural Machine Translation
Code
#10
SimCut
30.56
BLEU score
· 2022-06-06
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Code
#11
Mask Attention Network (big)
30.4
BLEU score
· 2021-03-25
Mask Attention Networks: Rethinking and Strengthen Transformer
Code
#12
Transformer (ADMIN init)
30.1
BLEU score
· 2020-08-18
Very Deep Transformers for Neural Machine Translation
Code
#13
PowerNorm (Transformer)
30.1
BLEU score
· 2020-03-17
PowerNorm: Rethinking Batch Normalization in Transformers
Code
#14
Depth Growing
30.07
BLEU score
· 2019-07-03
Depth Growing for Neural Machine Translation
Code
#15
MUSE(Parallel Multi-scale Attention)
29.9
BLEU score
· 2019-11-17
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Code
#16
Evolved Transformer Big
29.8
BLEU score
· 2019-01-30
The Evolved Transformer
Code
#17
OmniNetP
29.8
BLEU score
· 2021-03-01
OmniNet: Omnidirectional Representations from Transformers
Code
#18
DynamicConv
29.7
BLEU score
· 2019-01-29
Pay Less Attention with Lightweight and Dynamic Convolutions
Code
#19
Local Joint Self-attention
29.7
BLEU score
· 2019-05-16
Joint Source-Target Self Attention with Locality Constraints
Code
#20
TaLK Convolutions
29.6
BLEU score
· 2020-02-08
Time-aware Large Kernel Convolutions
Code
#21
Transformer Big + MoS
29.6
BLEU score
· 2018-09-25
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation
Code
#22
AdvAug (aut+adv)
29.57
BLEU score
· 2020-06-21
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation
#23
PartialFormer
29.56
BLEU score
· 2023-10-23
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Code
#24
Transformer Big + adversarial MLE
29.52
BLEU score
· 2019-06-10
Improving Neural Language Modeling via Adversarial Training
Code
#25
Transformer Big
SOTA
29.3
BLEU score
· 2018-06-01
Scaling Neural Machine Translation
Code
#26
Subformer-xlarge
29.3
BLEU score
No paper
#27
SB-NMT
29.21
BLEU score
· 2019-05-13
Synchronous Bidirectional Neural Machine Translation
Code
#28
Transformer (big) + Relative Position Representations
SOTA
29.2
BLEU score
· 2018-03-06
Self-Attention with Relative Position Representations
Code
#29
FLOATER-large
29.2
BLEU score
· 2020-03-13
Learning to Encode Position for Transformer with Continuous Dynamical Model
Code
#30
Local Transformer
29.2
BLEU score
· 2018-10-24
Modeling Localness for Self-Attention Networks
#31
Transformer Big with FRAGE
29.11
BLEU score
· 2018-09-18
FRAGE: Frequency-Agnostic Word Representation
Code
#32
Mask Attention Network (base)
29.1
BLEU score
· 2021-03-25
Mask Attention Networks: Rethinking and Strengthen Transformer
Code
#33
Mega
29.01
BLEU score
· 2022-09-21
Mega: Moving Average Equipped Gated Attention
Code
#34
adequacy-oriented NMT
28.99
BLEU score
· 2018-11-21
Neural Machine Translation with Adequacy-Oriented Learning
#35
LightConv
28.9
BLEU score
· 2019-01-29
Pay Less Attention with Lightweight and Dynamic Convolutions
Code
#36
Weighted Transformer (large)
SOTA
28.9
BLEU score
· 2017-11-06
Weighted Transformer Network for Machine Translation
Code
#37
universal transformer base
28.9
BLEU score
· 2018-07-10
Universal Transformers
Code
#38
KERMIT
28.7
BLEU score
· 2019-06-04
KERMIT: Generative Insertion-Based Modeling for Sequences
#39
T2R + Pretrain
28.7
BLEU score
· 2021-03-24
Finetuning Pretrained Transformers into RNNs
Code
#40
AdvAug (aut)
28.58
BLEU score
· 2020-06-21
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation
#41
RNMT+
28.5
BLEU score
· 2018-04-26
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Code
#42
Synthesizer (Random + Vanilla)
28.47
BLEU score
· 2020-05-02
Synthesizer: Rethinking Self-Attention in Transformer Models
Code
#43
Hardware Aware Transformer
28.4
BLEU score
· 2020-05-28
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Code
#44
Transformer Big
SOTA
28.4
BLEU score
· 2017-06-12
Attention Is All You Need
Code
#45
Transformer + SRU
28.4
BLEU score
· 2017-09-08
Simple Recurrent Units for Highly Parallelizable Recurrence
Code
#46
Evolved Transformer Base
28.4
BLEU score
· 2019-01-30
The Evolved Transformer
Code
#47
Rfa-Gate-arccos
28.2
BLEU score
· 2021-03-03
Random Feature Attention
#48
Transformer-DRILL Base
28.1
BLEU score
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#49
AdvAug (mixup)
28.08
BLEU score
· 2020-06-21
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation
#50
CMLM+LAT+4 iterations
27.35
BLEU score
· 2020-11-12
Incorporating a Local Translation Mechanism into Non-autoregressive Translation
Code
#51
Transformer Base
27.3
BLEU score
· 2017-06-12
Attention Is All You Need
Code
#52
Levenshtein Transformer (distillation)
27.27
BLEU score
· 2019-05-27
Levenshtein Transformer
Code
#53
DisCo + Mask-Predict (non-autoregressive)
27.06
BLEU score
No paper
Code
#54
Adaptively Sparse Transformer (alpha-entmax)
26.93
BLEU score
· 2019-08-30
Adaptively Sparse Transformers
Code
#55
ResMLP-12
26.8
BLEU score
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#56
CNAT
26.6
BLEU score
· 2021-03-21
Non-Autoregressive Translation by Learning Target Categorical Codes
Code
#57
Lite Transformer
26.5
BLEU score
· 2020-04-24
Lite Transformer with Long-Short Range Attention
Code
#58
ConvS2S (ensemble)
SOTA
26.4
BLEU score
· 2017-05-08
Convolutional Sequence to Sequence Learning
Code
#59
ResMLP-6
26.4
BLEU score
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#60
Average Attention Network
26.31
BLEU score
· 2018-05-02
Accelerating Neural Transformer via an Average Attention Network
Code
#61
GNMT+RL
SOTA
26.3
BLEU score
· 2016-09-26
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Code
#62
SliceNet
26.1
BLEU score
· 2017-06-09
Depthwise Separable Convolutions for Neural Machine Translation
Code
#63
Average Attention Network (w/o FFN)
26.05
BLEU score
· 2018-05-02
Accelerating Neural Transformer via an Average Attention Network
Code
#64
MoE
26.03
BLEU score
· 2017-01-23
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Code
#65
Average Attention Network (w/o gate)
25.91
BLEU score
· 2018-05-02
Accelerating Neural Transformer via an Average Attention Network
Code
#66
Adaptively Sparse Transformer (1.5-entmax)
25.89
BLEU score
· 2019-08-30
Adaptively Sparse Transformers
Code
#67
DenseNMT
25.52
BLEU score
· 2018-06-03
Dense Information Flow for Neural Machine Translation
Code
#68
GLAT
25.21
BLEU score
· 2020-08-18
Glancing Transformer for Non-Autoregressive Neural Machine Translation
Code
#69
CMLM+LAT+1 iterations
25.2
BLEU score
· 2020-11-12
Incorporating a Local Translation Mechanism into Non-autoregressive Translation
Code
#70
ConvS2S
25.16
BLEU score
· 2017-05-08
Convolutional Sequence to Sequence Learning
Code
#71
ByteNet
23.75
BLEU score
· 2016-10-31
Neural Machine Translation in Linear Time
Code
#72
FlowSeq-large (NPD n = 30)
23.64
BLEU score
· 2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Code
#73
FlowSeq-large (NPD n = 15)
23.14
BLEU score
· 2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Code
#74
FlowSeq-large (IWD n = 15)
22.94
BLEU score
· 2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Code
#75
Denoising autoencoders (non-autoregressive)
21.54
BLEU score
· 2018-02-19
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Code
#76
RNN Enc-Dec Att
SOTA
20.9
BLEU score
· 2015-08-17
Effective Approaches to Attention-based Neural Machine Translation
Code
#77
FlowSeq-large
20.85
BLEU score
· 2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Code
#78
PBMT
20.7
BLEU score
No paper
#79
Deep-Att
20.7
BLEU score
· 2016-06-14
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
Code
#80
Phrase Based MT
20.7
BLEU score
No paper
#81
PBSMT + NMT
20.23
BLEU score
· 2018-04-20
Phrase-Based & Neural Unsupervised Machine Translation
Code
#82
NAT +FT + NPD
19.17
BLEU score
· 2017-11-07
Non-Autoregressive Neural Machine Translation
Code
#83
FlowSeq-base
18.55
BLEU score
· 2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Code
#84
Seq-KD + Seq-Inter + Word-KD
18.5
BLEU score
· 2016-06-25
Sequence-Level Knowledge Distillation
Code
#85
Unsupervised PBSMT
17.94
BLEU score
· 2018-04-20
Phrase-Based & Neural Unsupervised Machine Translation
Code
#86
NSE-NSE
17.9
BLEU score
· 2016-07-14
Neural Semantic Encoders
Code
#87
Unsupervised NMT + Transformer
17.16
BLEU score
· 2018-04-20
Phrase-Based & Neural Unsupervised Machine Translation
Code
#88
SMT + iterative backtranslation (unsupervised)
14.08
BLEU score
· 2018-09-04
Unsupervised Statistical Machine Translation
Code
#89
Reverse RNN Enc-Dec
14
BLEU score
· 2015-08-17
Effective Approaches to Attention-based Neural Machine Translation
Code
#90
RNN Enc-Dec
11.3
BLEU score
· 2015-08-17
Effective Approaches to Attention-based Neural Machine Translation
Code