Machine Translation on WMT2014 English-German

Metric: BLEU score (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	BLEU score▼	Extra Data	Paper	Date↕	Code
1	Transformer Cycle (Rev)	35.14	No	Lessons on Parameter Sharing across Layers in Tr...	2021-04-13	Code
2	Noisy back-translation	35	Yes	Understanding Back-Translation at Scale	2018-08-28	Code
3	Transformer+Rep(Uni)	33.89	No	Rethinking Perturbations in Encoder-Decoders for...	2021-04-05	Code
4	T5-11B	32.1	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
5	BiBERT	31.26	No	BERT, mBERT, or BiBERT? A Study on Contextualize...	2021-09-09	Code
6	Transformer + R-Drop	30.91	No	R-Drop: Regularized Dropout for Neural Networks	2021-06-28	Code
7	Bi-SimCut	30.78	No	Bi-SimCut: A Simple Strategy for Boosting Neural...	2022-06-06	Code
8	BERT-fused NMT	30.75	No	Incorporating BERT into Neural Machine Translation	2020-02-17	Code
9	Data Diversification - Transformer	30.7	No	Data Diversification: A Simple Strategy For Neur...	2019-11-05	Code
10	SimCut	30.56	No	Bi-SimCut: A Simple Strategy for Boosting Neural...	2022-06-06	Code
11	Mask Attention Network (big)	30.4	No	Mask Attention Networks: Rethinking and Strength...	2021-03-25	Code
12	Transformer (ADMIN init)	30.1	No	Very Deep Transformers for Neural Machine Transl...	2020-08-18	Code
13	PowerNorm (Transformer)	30.1	No	PowerNorm: Rethinking Batch Normalization in Tra...	2020-03-17	Code
14	Depth Growing	30.07	No	Depth Growing for Neural Machine Translation	2019-07-03	Code
15	MUSE(Parallel Multi-scale Attention)	29.9	No	MUSE: Parallel Multi-Scale Attention for Sequenc...	2019-11-17	Code
16	Evolved Transformer Big	29.8	No	The Evolved Transformer	2019-01-30	Code
17	OmniNetP	29.8	No	OmniNet: Omnidirectional Representations from Tr...	2021-03-01	Code
18	DynamicConv	29.7	No	Pay Less Attention with Lightweight and Dynamic ...	2019-01-29	Code
19	Local Joint Self-attention	29.7	No	Joint Source-Target Self Attention with Locality...	2019-05-16	Code
20	TaLK Convolutions	29.6	No	Time-aware Large Kernel Convolutions	2020-02-08	Code
21	Transformer Big + MoS	29.6	No	Fast and Simple Mixture of Softmaxes with BPE an...	2018-09-25	Code
22	AdvAug (aut+adv)	29.57	No	AdvAug: Robust Adversarial Augmentation for Neur...	2020-06-21	-
23	PartialFormer	29.56	No	PartialFormer: Modeling Part Instead of Whole fo...	2023-10-23	Code
24	Transformer Big + adversarial MLE	29.52	No	Improving Neural Language Modeling via Adversari...	2019-06-10	Code
25	Transformer Big	29.3	No	Scaling Neural Machine Translation	2018-06-01	Code
26	Subformer-xlarge	29.3	No	-	-	-
27	SB-NMT	29.21	No	Synchronous Bidirectional Neural Machine Transla...	2019-05-13	Code
28	Transformer (big) + Relative Position Representations	29.2	No	Self-Attention with Relative Position Representa...	2018-03-06	Code
29	FLOATER-large	29.2	No	Learning to Encode Position for Transformer with...	2020-03-13	Code
30	Local Transformer	29.2	No	Modeling Localness for Self-Attention Networks	2018-10-24	-
31	Transformer Big with FRAGE	29.11	No	FRAGE: Frequency-Agnostic Word Representation	2018-09-18	Code
32	Mask Attention Network (base)	29.1	No	Mask Attention Networks: Rethinking and Strength...	2021-03-25	Code
33	Mega	29.01	No	Mega: Moving Average Equipped Gated Attention	2022-09-21	Code
34	adequacy-oriented NMT	28.99	No	Neural Machine Translation with Adequacy-Oriente...	2018-11-21	-
35	LightConv	28.9	No	Pay Less Attention with Lightweight and Dynamic ...	2019-01-29	Code
36	Weighted Transformer (large)	28.9	No	Weighted Transformer Network for Machine Transla...	2017-11-06	Code
37	universal transformer base	28.9	No	Universal Transformers	2018-07-10	Code
38	KERMIT	28.7	No	KERMIT: Generative Insertion-Based Modeling for ...	2019-06-04	-
39	T2R + Pretrain	28.7	No	Finetuning Pretrained Transformers into RNNs	2021-03-24	Code
40	AdvAug (aut)	28.58	No	AdvAug: Robust Adversarial Augmentation for Neur...	2020-06-21	-
41	RNMT+	28.5	No	The Best of Both Worlds: Combining Recent Advanc...	2018-04-26	Code
42	Synthesizer (Random + Vanilla)	28.47	No	Synthesizer: Rethinking Self-Attention in Transf...	2020-05-02	Code
43	Hardware Aware Transformer	28.4	No	HAT: Hardware-Aware Transformers for Efficient N...	2020-05-28	Code
44	Transformer Big	28.4	No	Attention Is All You Need	2017-06-12	Code
45	Transformer + SRU	28.4	No	Simple Recurrent Units for Highly Parallelizable...	2017-09-08	Code
46	Evolved Transformer Base	28.4	No	The Evolved Transformer	2019-01-30	Code
47	Rfa-Gate-arccos	28.2	No	Random Feature Attention	2021-03-03	-
48	Transformer-DRILL Base	28.1	No	Deep Residual Output Layers for Neural Language ...	2019-05-14	Code
49	AdvAug (mixup)	28.08	No	AdvAug: Robust Adversarial Augmentation for Neur...	2020-06-21	-
50	CMLM+LAT+4 iterations	27.35	No	Incorporating a Local Translation Mechanism into...	2020-11-12	Code
51	Transformer Base	27.3	No	Attention Is All You Need	2017-06-12	Code
52	Levenshtein Transformer (distillation)	27.27	No	Levenshtein Transformer	2019-05-27	Code
53	DisCo + Mask-Predict (non-autoregressive)	27.06	No	-	-	Code
54	Adaptively Sparse Transformer (alpha-entmax)	26.93	No	Adaptively Sparse Transformers	2019-08-30	Code
55	ResMLP-12	26.8	No	ResMLP: Feedforward networks for image classific...	2021-05-07	Code
56	CNAT	26.6	No	Non-Autoregressive Translation by Learning Targe...	2021-03-21	Code
57	Lite Transformer	26.5	No	Lite Transformer with Long-Short Range Attention	2020-04-24	Code
58	ConvS2S (ensemble)	26.4	No	Convolutional Sequence to Sequence Learning	2017-05-08	Code
59	ResMLP-6	26.4	No	ResMLP: Feedforward networks for image classific...	2021-05-07	Code
60	Average Attention Network	26.31	No	Accelerating Neural Transformer via an Average A...	2018-05-02	Code
61	GNMT+RL	26.3	No	Google's Neural Machine Translation System: Brid...	2016-09-26	Code
62	SliceNet	26.1	No	Depthwise Separable Convolutions for Neural Mach...	2017-06-09	Code
63	Average Attention Network (w/o FFN)	26.05	No	Accelerating Neural Transformer via an Average A...	2018-05-02	Code
64	MoE	26.03	No	Outrageously Large Neural Networks: The Sparsely...	2017-01-23	Code
65	Average Attention Network (w/o gate)	25.91	No	Accelerating Neural Transformer via an Average A...	2018-05-02	Code
66	Adaptively Sparse Transformer (1.5-entmax)	25.89	No	Adaptively Sparse Transformers	2019-08-30	Code
67	DenseNMT	25.52	No	Dense Information Flow for Neural Machine Transl...	2018-06-03	Code
68	GLAT	25.21	No	Glancing Transformer for Non-Autoregressive Neur...	2020-08-18	Code
69	CMLM+LAT+1 iterations	25.2	No	Incorporating a Local Translation Mechanism into...	2020-11-12	Code
70	ConvS2S	25.16	No	Convolutional Sequence to Sequence Learning	2017-05-08	Code
71	ByteNet	23.75	No	Neural Machine Translation in Linear Time	2016-10-31	Code
72	FlowSeq-large (NPD n = 30)	23.64	No	FlowSeq: Non-Autoregressive Conditional Sequence...	2019-09-05	Code
73	FlowSeq-large (NPD n = 15)	23.14	No	FlowSeq: Non-Autoregressive Conditional Sequence...	2019-09-05	Code
74	FlowSeq-large (IWD n = 15)	22.94	No	FlowSeq: Non-Autoregressive Conditional Sequence...	2019-09-05	Code
75	Denoising autoencoders (non-autoregressive)	21.54	No	Deterministic Non-Autoregressive Neural Sequence...	2018-02-19	Code
76	RNN Enc-Dec Att	20.9	No	Effective Approaches to Attention-based Neural M...	2015-08-17	Code
77	FlowSeq-large	20.85	No	FlowSeq: Non-Autoregressive Conditional Sequence...	2019-09-05	Code
78	PBMT	20.7	No	-	-	-
79	Deep-Att	20.7	No	Deep Recurrent Models with Fast-Forward Connecti...	2016-06-14	Code
80	Phrase Based MT	20.7	No	-	-	-
81	PBSMT + NMT	20.23	No	Phrase-Based & Neural Unsupervised Machine Trans...	2018-04-20	Code
82	NAT +FT + NPD	19.17	No	Non-Autoregressive Neural Machine Translation	2017-11-07	Code
83	FlowSeq-base	18.55	No	FlowSeq: Non-Autoregressive Conditional Sequence...	2019-09-05	Code
84	Seq-KD + Seq-Inter + Word-KD	18.5	No	Sequence-Level Knowledge Distillation	2016-06-25	Code
85	Unsupervised PBSMT	17.94	No	Phrase-Based & Neural Unsupervised Machine Trans...	2018-04-20	Code
86	NSE-NSE	17.9	No	Neural Semantic Encoders	2016-07-14	Code
87	Unsupervised NMT + Transformer	17.16	No	Phrase-Based & Neural Unsupervised Machine Trans...	2018-04-20	Code
88	SMT + iterative backtranslation (unsupervised)	14.08	No	Unsupervised Statistical Machine Translation	2018-09-04	Code
89	Reverse RNN Enc-Dec	14	No	Effective Approaches to Attention-based Neural M...	2015-08-17	Code
90	RNN Enc-Dec	11.3	No	Effective Approaches to Attention-based Neural M...	2015-08-17	Code