Machine Translation on IWSLT2014 German-English

Metric: BLEU score (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	BLEU score▼	Extra Data	Paper	Date↕	Code
1	PiNMT	40.43	No	Integrating Pre-trained Language Model into Neur...	2023-10-30	-
2	BiBERT	38.61	No	BERT, mBERT, or BiBERT? A Study on Contextualize...	2021-09-09	Code
3	Bi-SimCut	38.37	No	Bi-SimCut: A Simple Strategy for Boosting Neural...	2022-06-06	Code
4	Cutoff + Relaxed Attention + LM	37.96	Yes	Relaxed Attention for Transformer Models	2022-09-20	Code
5	DRDA	37.95	No	Deterministic Reversible Data Augmentation for N...	2024-06-04	Code
6	Transformer + R-Drop + Cutoff	37.9	No	R-Drop: Regularized Dropout for Neural Networks	2021-06-28	Code
7	SimCut	37.81	No	Bi-SimCut: A Simple Strategy for Boosting Neural...	2022-06-06	Code
8	Cutoff+Knee	37.78	No	Wide-minima Density Hypothesis and the Explore-E...	2020-03-09	Code
9	Cutoff	37.6	No	A Simple but Tough-to-Beat Data Augmentation App...	2020-09-29	Code
10	CipherDAug	37.53	No	CipherDAug: Ciphertext based Data Augmentation f...	2022-04-01	Code
11	Transformer + R-Drop	37.25	No	R-Drop: Regularized Dropout for Neural Networks	2021-06-28	Code
12	Data Diversification	37.2	No	Data Diversification: A Simple Strategy For Neur...	2019-11-05	Code
13	UniDrop	36.88	No	UniDrop: A Simple yet Effective Technique to Imp...	2021-04-11	-
14	MixedRepresentations	36.41	No	-	-	Code
15	Mask Attention Network (small)	36.3	No	Mask Attention Networks: Rethinking and Strength...	2021-03-25	Code
16	MUSE(Parallel Multi-scale Attention)	36.3	No	MUSE: Parallel Multi-Scale Attention for Sequenc...	2019-11-17	Code
17	Transformer+Rep(Sim)+WDrop	36.22	No	Rethinking Perturbations in Encoder-Decoders for...	2021-04-05	Code
18	MAT	36.22	No	Multi-branch Attentive Transformer	2020-06-18	Code
19	TransformerBase + AutoDropout	35.8	No	AutoDropout: Learning Dropout Patterns to Regula...	2021-01-05	Code
20	Local Joint Self-attention	35.7	No	Joint Source-Target Self Attention with Locality...	2019-05-16	Code
21	TaLK Convolutions	35.5	No	Time-aware Large Kernel Convolutions	2020-02-08	Code
22	ImitKD + Full	35.4	No	Autoregressive Knowledge Distillation through Im...	2020-09-15	Code
23	DeLighT	35.3	No	DeLighT: Deep and Light-weight Transformer	2020-08-03	Code
24	DynamicConv	35.2	No	Pay Less Attention with Lightweight and Dynamic ...	2019-01-29	Code
25	Transformer	35.1385	No	Guidelines for the Regularization of Gammas in B...	2022-05-15	-
26	LightConv	34.8	No	Pay Less Attention with Lightweight and Dynamic ...	2019-01-29	Code
27	Transformer	34.44	No	Attention Is All You Need	2017-06-12	Code
28	Rfa-Gate-arccos	34.4	No	Random Feature Attention	2021-03-03	-
29	Variational Attention	33.1	No	Latent Alignment and Variational Attention	2018-07-10	Code
30	Minimum Risk Training [Edunov2017]	32.84	No	Classical Structured Prediction Losses for Seque...	2017-11-14	Code
31	CNAT	31.15	No	Non-Autoregressive Translation by Learning Targe...	2021-03-21	Code
32	Neural PBMT + LM [Huang2018]	30.08	No	Towards Neural Phrase-based Machine Translation	2017-06-17	Code
33	Back-Translation Finetuning	28.83	Yes	Tag-less Back-Translation	2019-12-22	-
34	Actor-Critic [Bahdanau2017]	28.53	No	An Actor-Critic Algorithm for Sequence Prediction	2016-07-24	Code