Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Machine Translation
/
IWSLT2014 German-English
Machine Translation on IWSLT2014 German-English
Metric: BLEU score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
BLEU score (best first)
BLEU score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
BLEU score
▼
Extra Data
Paper
Date
↕
Code
1
PiNMT
40.43
No
Integrating Pre-trained Language Model into Neur...
2023-10-30
-
2
BiBERT
38.61
No
BERT, mBERT, or BiBERT? A Study on Contextualize...
2021-09-09
Code
3
Bi-SimCut
38.37
No
Bi-SimCut: A Simple Strategy for Boosting Neural...
2022-06-06
Code
4
Cutoff + Relaxed Attention + LM
37.96
Yes
Relaxed Attention for Transformer Models
2022-09-20
Code
5
DRDA
37.95
No
Deterministic Reversible Data Augmentation for N...
2024-06-04
Code
6
Transformer + R-Drop + Cutoff
37.9
No
R-Drop: Regularized Dropout for Neural Networks
2021-06-28
Code
7
SimCut
37.81
No
Bi-SimCut: A Simple Strategy for Boosting Neural...
2022-06-06
Code
8
Cutoff+Knee
37.78
No
Wide-minima Density Hypothesis and the Explore-E...
2020-03-09
Code
9
Cutoff
37.6
No
A Simple but Tough-to-Beat Data Augmentation App...
2020-09-29
Code
10
CipherDAug
37.53
No
CipherDAug: Ciphertext based Data Augmentation f...
2022-04-01
Code
11
Transformer + R-Drop
37.25
No
R-Drop: Regularized Dropout for Neural Networks
2021-06-28
Code
12
Data Diversification
37.2
No
Data Diversification: A Simple Strategy For Neur...
2019-11-05
Code
13
UniDrop
36.88
No
UniDrop: A Simple yet Effective Technique to Imp...
2021-04-11
-
14
MixedRepresentations
36.41
No
-
-
Code
15
Mask Attention Network (small)
36.3
No
Mask Attention Networks: Rethinking and Strength...
2021-03-25
Code
16
MUSE(Parallel Multi-scale Attention)
36.3
No
MUSE: Parallel Multi-Scale Attention for Sequenc...
2019-11-17
Code
17
Transformer+Rep(Sim)+WDrop
36.22
No
Rethinking Perturbations in Encoder-Decoders for...
2021-04-05
Code
18
MAT
36.22
No
Multi-branch Attentive Transformer
2020-06-18
Code
19
TransformerBase + AutoDropout
35.8
No
AutoDropout: Learning Dropout Patterns to Regula...
2021-01-05
Code
20
Local Joint Self-attention
35.7
No
Joint Source-Target Self Attention with Locality...
2019-05-16
Code
21
TaLK Convolutions
35.5
No
Time-aware Large Kernel Convolutions
2020-02-08
Code
22
ImitKD + Full
35.4
No
Autoregressive Knowledge Distillation through Im...
2020-09-15
Code
23
DeLighT
35.3
No
DeLighT: Deep and Light-weight Transformer
2020-08-03
Code
24
DynamicConv
35.2
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
25
Transformer
35.1385
No
Guidelines for the Regularization of Gammas in B...
2022-05-15
-
26
LightConv
34.8
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
27
Transformer
34.44
No
Attention Is All You Need
2017-06-12
Code
28
Rfa-Gate-arccos
34.4
No
Random Feature Attention
2021-03-03
-
29
Variational Attention
33.1
No
Latent Alignment and Variational Attention
2018-07-10
Code
30
Minimum Risk Training [Edunov2017]
32.84
No
Classical Structured Prediction Losses for Seque...
2017-11-14
Code
31
CNAT
31.15
No
Non-Autoregressive Translation by Learning Targe...
2021-03-21
Code
32
Neural PBMT + LM [Huang2018]
30.08
No
Towards Neural Phrase-based Machine Translation
2017-06-17
Code
33
Back-Translation Finetuning
28.83
Yes
Tag-less Back-Translation
2019-12-22
-
34
Actor-Critic [Bahdanau2017]
28.53
No
An Actor-Critic Algorithm for Sequence Prediction
2016-07-24
Code
#1
PiNMT
SOTA
40.43
BLEU score
· 2023-10-30
Integrating Pre-trained Language Model into Neural Machine Translation
#2
BiBERT
SOTA
38.61
BLEU score
· 2021-09-09
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Code
#3
Bi-SimCut
38.37
BLEU score
· 2022-06-06
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Code
#4
Cutoff + Relaxed Attention + LM
37.96
BLEU score
· Extra Data
· 2022-09-20
Relaxed Attention for Transformer Models
Code
#5
DRDA
37.95
BLEU score
· 2024-06-04
Deterministic Reversible Data Augmentation for Neural Machine Translation
Code
#6
Transformer + R-Drop + Cutoff
SOTA
37.9
BLEU score
· 2021-06-28
R-Drop: Regularized Dropout for Neural Networks
Code
#7
SimCut
37.81
BLEU score
· 2022-06-06
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Code
#8
Cutoff+Knee
SOTA
37.78
BLEU score
· 2020-03-09
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Code
#9
Cutoff
37.6
BLEU score
· 2020-09-29
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation
Code
#10
CipherDAug
37.53
BLEU score
· 2022-04-01
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Code
#11
Transformer + R-Drop
37.25
BLEU score
· 2021-06-28
R-Drop: Regularized Dropout for Neural Networks
Code
#12
Data Diversification
SOTA
37.2
BLEU score
· 2019-11-05
Data Diversification: A Simple Strategy For Neural Machine Translation
Code
#13
UniDrop
36.88
BLEU score
· 2021-04-11
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
#14
MixedRepresentations
36.41
BLEU score
No paper
Code
#15
Mask Attention Network (small)
36.3
BLEU score
· 2021-03-25
Mask Attention Networks: Rethinking and Strengthen Transformer
Code
#16
MUSE(Parallel Multi-scale Attention)
36.3
BLEU score
· 2019-11-17
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Code
#17
Transformer+Rep(Sim)+WDrop
36.22
BLEU score
· 2021-04-05
Rethinking Perturbations in Encoder-Decoders for Fast Training
Code
#18
MAT
36.22
BLEU score
· 2020-06-18
Multi-branch Attentive Transformer
Code
#19
TransformerBase + AutoDropout
35.8
BLEU score
· 2021-01-05
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Code
#20
Local Joint Self-attention
SOTA
35.7
BLEU score
· 2019-05-16
Joint Source-Target Self Attention with Locality Constraints
Code
#21
TaLK Convolutions
35.5
BLEU score
· 2020-02-08
Time-aware Large Kernel Convolutions
Code
#22
ImitKD + Full
35.4
BLEU score
· 2020-09-15
Autoregressive Knowledge Distillation through Imitation Learning
Code
#23
DeLighT
35.3
BLEU score
· 2020-08-03
DeLighT: Deep and Light-weight Transformer
Code
#24
DynamicConv
SOTA
35.2
BLEU score
· 2019-01-29
Pay Less Attention with Lightweight and Dynamic Convolutions
Code
#25
Transformer
35.1385
BLEU score
· 2022-05-15
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
#26
LightConv
34.8
BLEU score
· 2019-01-29
Pay Less Attention with Lightweight and Dynamic Convolutions
Code
#27
Transformer
SOTA
34.44
BLEU score
· 2017-06-12
Attention Is All You Need
Code
#28
Rfa-Gate-arccos
34.4
BLEU score
· 2021-03-03
Random Feature Attention
#29
Variational Attention
33.1
BLEU score
· 2018-07-10
Latent Alignment and Variational Attention
Code
#30
Minimum Risk Training [Edunov2017]
32.84
BLEU score
· 2017-11-14
Classical Structured Prediction Losses for Sequence to Sequence Learning
Code
#31
CNAT
31.15
BLEU score
· 2021-03-21
Non-Autoregressive Translation by Learning Target Categorical Codes
Code
#32
Neural PBMT + LM [Huang2018]
30.08
BLEU score
· 2017-06-17
Towards Neural Phrase-based Machine Translation
Code
#33
Back-Translation Finetuning
28.83
BLEU score
· Extra Data
· 2019-12-22
Tag-less Back-Translation
#34
Actor-Critic [Bahdanau2017]
SOTA
28.53
BLEU score
· 2016-07-24
An Actor-Critic Algorithm for Sequence Prediction
Code