Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Machine Translation
/
IWSLT2014 German-English
Machine Translation on IWSLT2014 German-English
Metric: BLEU score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
BLEU score
▼
Extra Data
Paper
Date
↕
Code
1
PiNMT
40.43
No
Integrating Pre-trained Language Model into Neur...
2023-10-30
-
2
BiBERT
38.61
No
BERT, mBERT, or BiBERT? A Study on Contextualize...
2021-09-09
Code
3
Bi-SimCut
38.37
No
Bi-SimCut: A Simple Strategy for Boosting Neural...
2022-06-06
Code
4
Cutoff + Relaxed Attention + LM
37.96
Yes
Relaxed Attention for Transformer Models
2022-09-20
Code
5
DRDA
37.95
No
Deterministic Reversible Data Augmentation for N...
2024-06-04
Code
6
Transformer + R-Drop + Cutoff
37.9
No
R-Drop: Regularized Dropout for Neural Networks
2021-06-28
Code
7
SimCut
37.81
No
Bi-SimCut: A Simple Strategy for Boosting Neural...
2022-06-06
Code
8
Cutoff+Knee
37.78
No
Wide-minima Density Hypothesis and the Explore-E...
2020-03-09
Code
9
Cutoff
37.6
No
A Simple but Tough-to-Beat Data Augmentation App...
2020-09-29
Code
10
CipherDAug
37.53
No
CipherDAug: Ciphertext based Data Augmentation f...
2022-04-01
Code
11
Transformer + R-Drop
37.25
No
R-Drop: Regularized Dropout for Neural Networks
2021-06-28
Code
12
Data Diversification
37.2
No
Data Diversification: A Simple Strategy For Neur...
2019-11-05
Code
13
UniDrop
36.88
No
UniDrop: A Simple yet Effective Technique to Imp...
2021-04-11
-
14
MixedRepresentations
36.41
No
-
-
Code
15
Mask Attention Network (small)
36.3
No
Mask Attention Networks: Rethinking and Strength...
2021-03-25
Code
16
MUSE(Parallel Multi-scale Attention)
36.3
No
MUSE: Parallel Multi-Scale Attention for Sequenc...
2019-11-17
Code
17
Transformer+Rep(Sim)+WDrop
36.22
No
Rethinking Perturbations in Encoder-Decoders for...
2021-04-05
Code
18
MAT
36.22
No
Multi-branch Attentive Transformer
2020-06-18
Code
19
TransformerBase + AutoDropout
35.8
No
AutoDropout: Learning Dropout Patterns to Regula...
2021-01-05
Code
20
Local Joint Self-attention
35.7
No
Joint Source-Target Self Attention with Locality...
2019-05-16
Code
21
TaLK Convolutions
35.5
No
Time-aware Large Kernel Convolutions
2020-02-08
Code
22
ImitKD + Full
35.4
No
Autoregressive Knowledge Distillation through Im...
2020-09-15
Code
23
DeLighT
35.3
No
DeLighT: Deep and Light-weight Transformer
2020-08-03
Code
24
DynamicConv
35.2
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
25
Transformer
35.1385
No
Guidelines for the Regularization of Gammas in B...
2022-05-15
-
26
LightConv
34.8
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
27
Transformer
34.44
No
Attention Is All You Need
2017-06-12
Code
28
Rfa-Gate-arccos
34.4
No
Random Feature Attention
2021-03-03
-
29
Variational Attention
33.1
No
Latent Alignment and Variational Attention
2018-07-10
Code
30
Minimum Risk Training [Edunov2017]
32.84
No
Classical Structured Prediction Losses for Seque...
2017-11-14
Code
31
CNAT
31.15
No
Non-Autoregressive Translation by Learning Targe...
2021-03-21
Code
32
Neural PBMT + LM [Huang2018]
30.08
No
Towards Neural Phrase-based Machine Translation
2017-06-17
Code
33
Back-Translation Finetuning
28.83
Yes
Tag-less Back-Translation
2019-12-22
-
34
Actor-Critic [Bahdanau2017]
28.53
No
An Actor-Critic Algorithm for Sequence Prediction
2016-07-24
Code