Machine Translation on WMT2014 English-French

Metric: BLEU score (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	BLEU score▼	Extra Data	Paper	Date↕	Code
1	Transformer+BT (ADMIN init)	46.4	Yes	Very Deep Transformers for Neural Machine Transl...	2020-08-18	Code
2	Noisy back-translation	45.6	Yes	Understanding Back-Translation at Scale	2018-08-28	Code
3	mRASP+Fine-Tune	44.3	Yes	Pre-training Multilingual Neural Machine Transla...	2020-10-07	Code
4	Transformer + R-Drop	43.95	No	R-Drop: Regularized Dropout for Neural Networks	2021-06-28	Code
5	Transformer (ADMIN init)	43.8	No	Very Deep Transformers for Neural Machine Transl...	2020-08-18	Code
6	Admin	43.8	No	Understanding the Difficulty of Training Transfo...	2020-04-17	Code
7	BERT-fused NMT	43.78	Yes	Incorporating BERT into Neural Machine Translation	2020-02-17	Code
8	MUSE(Paralllel Multi-scale Attention)	43.5	No	MUSE: Parallel Multi-Scale Attention for Sequenc...	2019-11-17	Code
9	T5	43.4	Yes	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
10	Local Joint Self-attention	43.3	No	Joint Source-Target Self Attention with Locality...	2019-05-16	Code
11	Depth Growing	43.27	No	Depth Growing for Neural Machine Translation	2019-07-03	Code
12	Transformer Big	43.2	No	Scaling Neural Machine Translation	2018-06-01	Code
13	DynamicConv	43.2	No	Pay Less Attention with Lightweight and Dynamic ...	2019-01-29	Code
14	TaLK Convolutions	43.2	No	Time-aware Large Kernel Convolutions	2020-02-08	Code
15	LightConv	43.1	No	Pay Less Attention with Lightweight and Dynamic ...	2019-01-29	Code
16	FLOATER-large	42.7	No	Learning to Encode Position for Transformer with...	2020-03-13	Code
17	OmniNetP	42.6	No	OmniNet: Omnidirectional Representations from Tr...	2021-03-01	Code
18	Transformer Big + MoS	42.1	No	Fast and Simple Mixture of Softmaxes with BPE an...	2018-09-25	Code
19	T2R + Pretrain	42.1	No	Finetuning Pretrained Transformers into RNNs	2021-03-24	Code
20	Synthesizer (Random + Vanilla)	41.85	No	Synthesizer: Rethinking Self-Attention in Transf...	2020-05-02	Code
21	Hardware Aware Transformer	41.8	No	HAT: Hardware-Aware Transformers for Efficient N...	2020-05-28	Code
22	Transformer (big) + Relative Position Representations	41.5	No	Self-Attention with Relative Position Representa...	2018-03-06	Code
23	Stack 4-layer RNNSearch + Dual Learning + Deliberation Network	41.5	No	-	-	-
24	Weighted Transformer (large)	41.4	No	Weighted Transformer Network for Machine Transla...	2017-11-06	Code
25	ConvS2S (ensemble)	41.3	No	Convolutional Sequence to Sequence Learning	2017-05-08	Code
26	Evolved Transformer Big	41.3	No	The Evolved Transformer	2019-01-30	Code
27	RNMT+	41	No	The Best of Both Worlds: Combining Recent Advanc...	2018-04-26	Code
28	Transformer Big	41	Yes	Attention Is All You Need	2017-06-12	Code
29	Evolved Transformer Base	40.6	No	The Evolved Transformer	2019-01-30	Code
30	ResMLP-12	40.6	No	ResMLP: Feedforward networks for image classific...	2021-05-07	Code
31	MoE	40.56	No	Outrageously Large Neural Networks: The Sparsely...	2017-01-23	Code
32	Transformer	40.5	No	Memory-Efficient Adaptive Optimization	2019-01-30	Code
33	ConvS2S	40.46	No	Convolutional Sequence to Sequence Learning	2017-05-08	Code
34	ResMLP-6	40.3	No	ResMLP: Feedforward networks for image classific...	2021-05-07	Code
35	TransformerBase + AutoDropout	40	No	AutoDropout: Learning Dropout Patterns to Regula...	2021-01-05	Code
36	GNMT+RL	39.9	No	Google's Neural Machine Translation System: Brid...	2016-09-26	Code
37	Lite Transformer	39.6	No	Lite Transformer with Long-Short Range Attention	2020-04-24	Code
38	Deep-Att + PosUnk	39.2	No	Deep Recurrent Models with Fast-Forward Connecti...	2016-06-14	Code
39	Rfa-Gate-arccos	39.2	No	Random Feature Attention	2021-03-03	-
40	Transformer Base	38.1	No	Attention Is All You Need	2017-06-12	Code
41	LSTM6 + PosUnk	37.5	No	Addressing the Rare Word Problem in Neural Machi...	2014-10-30	Code
42	PBMT	37	No	-	-	-
43	SMT+LSTM5	36.5	No	Sequence to Sequence Learning with Neural Networks	2014-09-10	Code
44	RNN-search50*	36.2	No	Neural Machine Translation by Jointly Learning t...	2014-09-01	Code
45	Deep-Att	35.9	No	Deep Recurrent Models with Fast-Forward Connecti...	2016-06-14	Code
46	Deep Convolutional Encoder; single-layer decoder	35.7	No	A Convolutional Encoder Model for Neural Machine...	2016-11-07	Code
47	LSTM	34.8	No	Sequence to Sequence Learning with Neural Networks	2014-09-10	Code
48	CSLM + RNN + WP	34.54	No	Learning Phrase Representations using RNN Encode...	2014-06-03	Code
49	FLAN 137B (zero-shot)	33.9	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
50	FLAN 137B (few-shot, k=9)	33.8	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
51	Regularized LSTM	29.03	No	Recurrent Neural Network Regularization	2014-09-08	Code
52	Unsupervised PBSMT	28.11	No	Phrase-Based & Neural Unsupervised Machine Trans...	2018-04-20	Code
53	PBSMT + NMT	27.6	No	Phrase-Based & Neural Unsupervised Machine Trans...	2018-04-20	Code
54	GRU+Attention	26.4	No	Can Active Memory Replace Attention?	2016-10-27	Code
55	SMT + iterative backtranslation (unsupervised)	26.22	No	Unsupervised Statistical Machine Translation	2018-09-04	Code
56	Unsupervised NMT + Transformer	25.14	No	Phrase-Based & Neural Unsupervised Machine Trans...	2018-04-20	Code
57	Unsupervised attentional encoder-decoder + BPE	14.36	No	Unsupervised Neural Machine Translation	2017-10-30	Code