Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Machine Translation
/
WMT2014 English-French
Machine Translation on WMT2014 English-French
Metric: BLEU score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
BLEU score (best first)
BLEU score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
BLEU score
▼
Extra Data
Paper
Date
↕
Code
1
Transformer+BT (ADMIN init)
46.4
Yes
Very Deep Transformers for Neural Machine Transl...
2020-08-18
Code
2
Noisy back-translation
45.6
Yes
Understanding Back-Translation at Scale
2018-08-28
Code
3
mRASP+Fine-Tune
44.3
Yes
Pre-training Multilingual Neural Machine Transla...
2020-10-07
Code
4
Transformer + R-Drop
43.95
No
R-Drop: Regularized Dropout for Neural Networks
2021-06-28
Code
5
Transformer (ADMIN init)
43.8
No
Very Deep Transformers for Neural Machine Transl...
2020-08-18
Code
6
Admin
43.8
No
Understanding the Difficulty of Training Transfo...
2020-04-17
Code
7
BERT-fused NMT
43.78
Yes
Incorporating BERT into Neural Machine Translation
2020-02-17
Code
8
MUSE(Paralllel Multi-scale Attention)
43.5
No
MUSE: Parallel Multi-Scale Attention for Sequenc...
2019-11-17
Code
9
T5
43.4
Yes
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
10
Local Joint Self-attention
43.3
No
Joint Source-Target Self Attention with Locality...
2019-05-16
Code
11
Depth Growing
43.27
No
Depth Growing for Neural Machine Translation
2019-07-03
Code
12
Transformer Big
43.2
No
Scaling Neural Machine Translation
2018-06-01
Code
13
DynamicConv
43.2
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
14
TaLK Convolutions
43.2
No
Time-aware Large Kernel Convolutions
2020-02-08
Code
15
LightConv
43.1
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
16
FLOATER-large
42.7
No
Learning to Encode Position for Transformer with...
2020-03-13
Code
17
OmniNetP
42.6
No
OmniNet: Omnidirectional Representations from Tr...
2021-03-01
Code
18
Transformer Big + MoS
42.1
No
Fast and Simple Mixture of Softmaxes with BPE an...
2018-09-25
Code
19
T2R + Pretrain
42.1
No
Finetuning Pretrained Transformers into RNNs
2021-03-24
Code
20
Synthesizer (Random + Vanilla)
41.85
No
Synthesizer: Rethinking Self-Attention in Transf...
2020-05-02
Code
21
Hardware Aware Transformer
41.8
No
HAT: Hardware-Aware Transformers for Efficient N...
2020-05-28
Code
22
Transformer (big) + Relative Position Representations
41.5
No
Self-Attention with Relative Position Representa...
2018-03-06
Code
23
Stack 4-layer RNNSearch + Dual Learning + Deliberation Network
41.5
No
-
-
-
24
Weighted Transformer (large)
41.4
No
Weighted Transformer Network for Machine Transla...
2017-11-06
Code
25
ConvS2S (ensemble)
41.3
No
Convolutional Sequence to Sequence Learning
2017-05-08
Code
26
Evolved Transformer Big
41.3
No
The Evolved Transformer
2019-01-30
Code
27
RNMT+
41
No
The Best of Both Worlds: Combining Recent Advanc...
2018-04-26
Code
28
Transformer Big
41
Yes
Attention Is All You Need
2017-06-12
Code
29
Evolved Transformer Base
40.6
No
The Evolved Transformer
2019-01-30
Code
30
ResMLP-12
40.6
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
31
MoE
40.56
No
Outrageously Large Neural Networks: The Sparsely...
2017-01-23
Code
32
Transformer
40.5
No
Memory-Efficient Adaptive Optimization
2019-01-30
Code
33
ConvS2S
40.46
No
Convolutional Sequence to Sequence Learning
2017-05-08
Code
34
ResMLP-6
40.3
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
35
TransformerBase + AutoDropout
40
No
AutoDropout: Learning Dropout Patterns to Regula...
2021-01-05
Code
36
GNMT+RL
39.9
No
Google's Neural Machine Translation System: Brid...
2016-09-26
Code
37
Lite Transformer
39.6
No
Lite Transformer with Long-Short Range Attention
2020-04-24
Code
38
Deep-Att + PosUnk
39.2
No
Deep Recurrent Models with Fast-Forward Connecti...
2016-06-14
Code
39
Rfa-Gate-arccos
39.2
No
Random Feature Attention
2021-03-03
-
40
Transformer Base
38.1
No
Attention Is All You Need
2017-06-12
Code
41
LSTM6 + PosUnk
37.5
No
Addressing the Rare Word Problem in Neural Machi...
2014-10-30
Code
42
PBMT
37
No
-
-
-
43
SMT+LSTM5
36.5
No
Sequence to Sequence Learning with Neural Networks
2014-09-10
Code
44
RNN-search50*
36.2
No
Neural Machine Translation by Jointly Learning t...
2014-09-01
Code
45
Deep-Att
35.9
No
Deep Recurrent Models with Fast-Forward Connecti...
2016-06-14
Code
46
Deep Convolutional Encoder; single-layer decoder
35.7
No
A Convolutional Encoder Model for Neural Machine...
2016-11-07
Code
47
LSTM
34.8
No
Sequence to Sequence Learning with Neural Networks
2014-09-10
Code
48
CSLM + RNN + WP
34.54
No
Learning Phrase Representations using RNN Encode...
2014-06-03
Code
49
FLAN 137B (zero-shot)
33.9
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
50
FLAN 137B (few-shot, k=9)
33.8
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
51
Regularized LSTM
29.03
No
Recurrent Neural Network Regularization
2014-09-08
Code
52
Unsupervised PBSMT
28.11
No
Phrase-Based & Neural Unsupervised Machine Trans...
2018-04-20
Code
53
PBSMT + NMT
27.6
No
Phrase-Based & Neural Unsupervised Machine Trans...
2018-04-20
Code
54
GRU+Attention
26.4
No
Can Active Memory Replace Attention?
2016-10-27
Code
55
SMT + iterative backtranslation (unsupervised)
26.22
No
Unsupervised Statistical Machine Translation
2018-09-04
Code
56
Unsupervised NMT + Transformer
25.14
No
Phrase-Based & Neural Unsupervised Machine Trans...
2018-04-20
Code
57
Unsupervised attentional encoder-decoder + BPE
14.36
No
Unsupervised Neural Machine Translation
2017-10-30
Code
#1
Transformer+BT (ADMIN init)
SOTA
46.4
BLEU score
· Extra Data
· 2020-08-18
Very Deep Transformers for Neural Machine Translation
Code
#2
Noisy back-translation
SOTA
45.6
BLEU score
· Extra Data
· 2018-08-28
Understanding Back-Translation at Scale
Code
#3
mRASP+Fine-Tune
44.3
BLEU score
· Extra Data
· 2020-10-07
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
Code
#4
Transformer + R-Drop
43.95
BLEU score
· 2021-06-28
R-Drop: Regularized Dropout for Neural Networks
Code
#5
Transformer (ADMIN init)
43.8
BLEU score
· 2020-08-18
Very Deep Transformers for Neural Machine Translation
Code
#6
Admin
43.8
BLEU score
· 2020-04-17
Understanding the Difficulty of Training Transformers
Code
#7
BERT-fused NMT
43.78
BLEU score
· Extra Data
· 2020-02-17
Incorporating BERT into Neural Machine Translation
Code
#8
MUSE(Paralllel Multi-scale Attention)
43.5
BLEU score
· 2019-11-17
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Code
#9
T5
43.4
BLEU score
· Extra Data
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#10
Local Joint Self-attention
43.3
BLEU score
· 2019-05-16
Joint Source-Target Self Attention with Locality Constraints
Code
#11
Depth Growing
43.27
BLEU score
· 2019-07-03
Depth Growing for Neural Machine Translation
Code
#12
Transformer Big
SOTA
43.2
BLEU score
· 2018-06-01
Scaling Neural Machine Translation
Code
#13
DynamicConv
43.2
BLEU score
· 2019-01-29
Pay Less Attention with Lightweight and Dynamic Convolutions
Code
#14
TaLK Convolutions
43.2
BLEU score
· 2020-02-08
Time-aware Large Kernel Convolutions
Code
#15
LightConv
43.1
BLEU score
· 2019-01-29
Pay Less Attention with Lightweight and Dynamic Convolutions
Code
#16
FLOATER-large
42.7
BLEU score
· 2020-03-13
Learning to Encode Position for Transformer with Continuous Dynamical Model
Code
#17
OmniNetP
42.6
BLEU score
· 2021-03-01
OmniNet: Omnidirectional Representations from Transformers
Code
#18
Transformer Big + MoS
42.1
BLEU score
· 2018-09-25
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation
Code
#19
T2R + Pretrain
42.1
BLEU score
· 2021-03-24
Finetuning Pretrained Transformers into RNNs
Code
#20
Synthesizer (Random + Vanilla)
41.85
BLEU score
· 2020-05-02
Synthesizer: Rethinking Self-Attention in Transformer Models
Code
#21
Hardware Aware Transformer
41.8
BLEU score
· 2020-05-28
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Code
#22
Transformer (big) + Relative Position Representations
SOTA
41.5
BLEU score
· 2018-03-06
Self-Attention with Relative Position Representations
Code
#23
Stack 4-layer RNNSearch + Dual Learning + Deliberation Network
41.5
BLEU score
No paper
#24
Weighted Transformer (large)
SOTA
41.4
BLEU score
· 2017-11-06
Weighted Transformer Network for Machine Translation
Code
#25
ConvS2S (ensemble)
SOTA
41.3
BLEU score
· 2017-05-08
Convolutional Sequence to Sequence Learning
Code
#26
Evolved Transformer Big
41.3
BLEU score
· 2019-01-30
The Evolved Transformer
Code
#27
RNMT+
41
BLEU score
· 2018-04-26
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Code
#28
Transformer Big
41
BLEU score
· Extra Data
· 2017-06-12
Attention Is All You Need
Code
#29
Evolved Transformer Base
40.6
BLEU score
· 2019-01-30
The Evolved Transformer
Code
#30
ResMLP-12
40.6
BLEU score
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#31
MoE
SOTA
40.56
BLEU score
· 2017-01-23
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Code
#32
Transformer
40.5
BLEU score
· 2019-01-30
Memory-Efficient Adaptive Optimization
Code
#33
ConvS2S
40.46
BLEU score
· 2017-05-08
Convolutional Sequence to Sequence Learning
Code
#34
ResMLP-6
40.3
BLEU score
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#35
TransformerBase + AutoDropout
40
BLEU score
· 2021-01-05
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Code
#36
GNMT+RL
SOTA
39.9
BLEU score
· 2016-09-26
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Code
#37
Lite Transformer
39.6
BLEU score
· 2020-04-24
Lite Transformer with Long-Short Range Attention
Code
#38
Deep-Att + PosUnk
SOTA
39.2
BLEU score
· 2016-06-14
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
Code
#39
Rfa-Gate-arccos
39.2
BLEU score
· 2021-03-03
Random Feature Attention
#40
Transformer Base
38.1
BLEU score
· 2017-06-12
Attention Is All You Need
Code
#41
LSTM6 + PosUnk
SOTA
37.5
BLEU score
· 2014-10-30
Addressing the Rare Word Problem in Neural Machine Translation
Code
#42
PBMT
37
BLEU score
No paper
#43
SMT+LSTM5
SOTA
36.5
BLEU score
· 2014-09-10
Sequence to Sequence Learning with Neural Networks
Code
#44
RNN-search50*
SOTA
36.2
BLEU score
· 2014-09-01
Neural Machine Translation by Jointly Learning to Align and Translate
Code
#45
Deep-Att
35.9
BLEU score
· 2016-06-14
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
Code
#46
Deep Convolutional Encoder; single-layer decoder
35.7
BLEU score
· 2016-11-07
A Convolutional Encoder Model for Neural Machine Translation
Code
#47
LSTM
34.8
BLEU score
· 2014-09-10
Sequence to Sequence Learning with Neural Networks
Code
#48
CSLM + RNN + WP
SOTA
34.54
BLEU score
· 2014-06-03
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Code
#49
FLAN 137B (zero-shot)
33.9
BLEU score
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#50
FLAN 137B (few-shot, k=9)
33.8
BLEU score
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#51
Regularized LSTM
29.03
BLEU score
· 2014-09-08
Recurrent Neural Network Regularization
Code
#52
Unsupervised PBSMT
28.11
BLEU score
· 2018-04-20
Phrase-Based & Neural Unsupervised Machine Translation
Code
#53
PBSMT + NMT
27.6
BLEU score
· 2018-04-20
Phrase-Based & Neural Unsupervised Machine Translation
Code
#54
GRU+Attention
26.4
BLEU score
· 2016-10-27
Can Active Memory Replace Attention?
Code
#55
SMT + iterative backtranslation (unsupervised)
26.22
BLEU score
· 2018-09-04
Unsupervised Statistical Machine Translation
Code
#56
Unsupervised NMT + Transformer
25.14
BLEU score
· 2018-04-20
Phrase-Based & Neural Unsupervised Machine Translation
Code
#57
Unsupervised attentional encoder-decoder + BPE
14.36
BLEU score
· 2017-10-30
Unsupervised Neural Machine Translation
Code