TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Machine Translation/WMT2014 English-German

Machine Translation on WMT2014 English-German

Metric: BLEU score (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕BLEU score▼Extra DataPaperDate↕Code
1Transformer Cycle (Rev)35.14NoLessons on Parameter Sharing across Layers in Tr...2021-04-13Code
2Noisy back-translation35YesUnderstanding Back-Translation at Scale2018-08-28Code
3Transformer+Rep(Uni)33.89NoRethinking Perturbations in Encoder-Decoders for...2021-04-05Code
4T5-11B32.1NoExploring the Limits of Transfer Learning with a...2019-10-23Code
5BiBERT31.26NoBERT, mBERT, or BiBERT? A Study on Contextualize...2021-09-09Code
6Transformer + R-Drop30.91NoR-Drop: Regularized Dropout for Neural Networks2021-06-28Code
7Bi-SimCut30.78NoBi-SimCut: A Simple Strategy for Boosting Neural...2022-06-06Code
8BERT-fused NMT30.75NoIncorporating BERT into Neural Machine Translation2020-02-17Code
9Data Diversification - Transformer30.7NoData Diversification: A Simple Strategy For Neur...2019-11-05Code
10SimCut30.56NoBi-SimCut: A Simple Strategy for Boosting Neural...2022-06-06Code
11Mask Attention Network (big)30.4NoMask Attention Networks: Rethinking and Strength...2021-03-25Code
12Transformer (ADMIN init)30.1NoVery Deep Transformers for Neural Machine Transl...2020-08-18Code
13PowerNorm (Transformer)30.1NoPowerNorm: Rethinking Batch Normalization in Tra...2020-03-17Code
14Depth Growing30.07NoDepth Growing for Neural Machine Translation2019-07-03Code
15MUSE(Parallel Multi-scale Attention)29.9NoMUSE: Parallel Multi-Scale Attention for Sequenc...2019-11-17Code
16Evolved Transformer Big29.8NoThe Evolved Transformer2019-01-30Code
17OmniNetP29.8NoOmniNet: Omnidirectional Representations from Tr...2021-03-01Code
18DynamicConv29.7NoPay Less Attention with Lightweight and Dynamic ...2019-01-29Code
19Local Joint Self-attention29.7NoJoint Source-Target Self Attention with Locality...2019-05-16Code
20TaLK Convolutions29.6NoTime-aware Large Kernel Convolutions2020-02-08Code
21Transformer Big + MoS29.6NoFast and Simple Mixture of Softmaxes with BPE an...2018-09-25Code
22AdvAug (aut+adv)29.57NoAdvAug: Robust Adversarial Augmentation for Neur...2020-06-21-
23PartialFormer29.56NoPartialFormer: Modeling Part Instead of Whole fo...2023-10-23Code
24Transformer Big + adversarial MLE29.52NoImproving Neural Language Modeling via Adversari...2019-06-10Code
25Transformer Big29.3NoScaling Neural Machine Translation2018-06-01Code
26Subformer-xlarge29.3No---
27SB-NMT29.21NoSynchronous Bidirectional Neural Machine Transla...2019-05-13Code
28Transformer (big) + Relative Position Representations29.2NoSelf-Attention with Relative Position Representa...2018-03-06Code
29FLOATER-large29.2NoLearning to Encode Position for Transformer with...2020-03-13Code
30Local Transformer29.2NoModeling Localness for Self-Attention Networks2018-10-24-
31Transformer Big with FRAGE29.11NoFRAGE: Frequency-Agnostic Word Representation2018-09-18Code
32Mask Attention Network (base)29.1NoMask Attention Networks: Rethinking and Strength...2021-03-25Code
33Mega29.01NoMega: Moving Average Equipped Gated Attention2022-09-21Code
34adequacy-oriented NMT28.99NoNeural Machine Translation with Adequacy-Oriente...2018-11-21-
35LightConv28.9NoPay Less Attention with Lightweight and Dynamic ...2019-01-29Code
36Weighted Transformer (large)28.9NoWeighted Transformer Network for Machine Transla...2017-11-06Code
37universal transformer base28.9NoUniversal Transformers2018-07-10Code
38KERMIT28.7NoKERMIT: Generative Insertion-Based Modeling for ...2019-06-04-
39T2R + Pretrain28.7NoFinetuning Pretrained Transformers into RNNs2021-03-24Code
40AdvAug (aut)28.58NoAdvAug: Robust Adversarial Augmentation for Neur...2020-06-21-
41RNMT+28.5NoThe Best of Both Worlds: Combining Recent Advanc...2018-04-26Code
42Synthesizer (Random + Vanilla)28.47NoSynthesizer: Rethinking Self-Attention in Transf...2020-05-02Code
43Hardware Aware Transformer28.4NoHAT: Hardware-Aware Transformers for Efficient N...2020-05-28Code
44Transformer Big28.4NoAttention Is All You Need2017-06-12Code
45Transformer + SRU28.4NoSimple Recurrent Units for Highly Parallelizable...2017-09-08Code
46Evolved Transformer Base28.4NoThe Evolved Transformer2019-01-30Code
47Rfa-Gate-arccos28.2NoRandom Feature Attention2021-03-03-
48Transformer-DRILL Base28.1NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
49AdvAug (mixup)28.08NoAdvAug: Robust Adversarial Augmentation for Neur...2020-06-21-
50CMLM+LAT+4 iterations27.35NoIncorporating a Local Translation Mechanism into...2020-11-12Code
51Transformer Base27.3NoAttention Is All You Need2017-06-12Code
52Levenshtein Transformer (distillation)27.27NoLevenshtein Transformer2019-05-27Code
53DisCo + Mask-Predict (non-autoregressive)27.06No--Code
54Adaptively Sparse Transformer (alpha-entmax)26.93NoAdaptively Sparse Transformers2019-08-30Code
55ResMLP-1226.8NoResMLP: Feedforward networks for image classific...2021-05-07Code
56CNAT26.6NoNon-Autoregressive Translation by Learning Targe...2021-03-21Code
57Lite Transformer26.5NoLite Transformer with Long-Short Range Attention2020-04-24Code
58ConvS2S (ensemble)26.4NoConvolutional Sequence to Sequence Learning2017-05-08Code
59ResMLP-626.4NoResMLP: Feedforward networks for image classific...2021-05-07Code
60Average Attention Network26.31NoAccelerating Neural Transformer via an Average A...2018-05-02Code
61GNMT+RL26.3NoGoogle's Neural Machine Translation System: Brid...2016-09-26Code
62SliceNet26.1NoDepthwise Separable Convolutions for Neural Mach...2017-06-09Code
63Average Attention Network (w/o FFN)26.05NoAccelerating Neural Transformer via an Average A...2018-05-02Code
64MoE26.03NoOutrageously Large Neural Networks: The Sparsely...2017-01-23Code
65Average Attention Network (w/o gate)25.91NoAccelerating Neural Transformer via an Average A...2018-05-02Code
66Adaptively Sparse Transformer (1.5-entmax)25.89NoAdaptively Sparse Transformers2019-08-30Code
67DenseNMT25.52NoDense Information Flow for Neural Machine Transl...2018-06-03Code
68GLAT25.21NoGlancing Transformer for Non-Autoregressive Neur...2020-08-18Code
69CMLM+LAT+1 iterations25.2NoIncorporating a Local Translation Mechanism into...2020-11-12Code
70ConvS2S25.16NoConvolutional Sequence to Sequence Learning2017-05-08Code
71ByteNet23.75NoNeural Machine Translation in Linear Time2016-10-31Code
72FlowSeq-large (NPD n = 30)23.64NoFlowSeq: Non-Autoregressive Conditional Sequence...2019-09-05Code
73FlowSeq-large (NPD n = 15)23.14NoFlowSeq: Non-Autoregressive Conditional Sequence...2019-09-05Code
74FlowSeq-large (IWD n = 15)22.94NoFlowSeq: Non-Autoregressive Conditional Sequence...2019-09-05Code
75Denoising autoencoders (non-autoregressive)21.54NoDeterministic Non-Autoregressive Neural Sequence...2018-02-19Code
76RNN Enc-Dec Att20.9NoEffective Approaches to Attention-based Neural M...2015-08-17Code
77FlowSeq-large20.85NoFlowSeq: Non-Autoregressive Conditional Sequence...2019-09-05Code
78PBMT20.7No---
79Deep-Att20.7NoDeep Recurrent Models with Fast-Forward Connecti...2016-06-14Code
80Phrase Based MT20.7No---
81PBSMT + NMT20.23NoPhrase-Based & Neural Unsupervised Machine Trans...2018-04-20Code
82NAT +FT + NPD19.17NoNon-Autoregressive Neural Machine Translation2017-11-07Code
83FlowSeq-base18.55NoFlowSeq: Non-Autoregressive Conditional Sequence...2019-09-05Code
84Seq-KD + Seq-Inter + Word-KD18.5NoSequence-Level Knowledge Distillation2016-06-25Code
85Unsupervised PBSMT17.94NoPhrase-Based & Neural Unsupervised Machine Trans...2018-04-20Code
86NSE-NSE17.9NoNeural Semantic Encoders2016-07-14Code
87Unsupervised NMT + Transformer17.16NoPhrase-Based & Neural Unsupervised Machine Trans...2018-04-20Code
88SMT + iterative backtranslation (unsupervised)14.08NoUnsupervised Statistical Machine Translation2018-09-04Code
89Reverse RNN Enc-Dec14NoEffective Approaches to Attention-based Neural M...2015-08-17Code
90RNN Enc-Dec11.3NoEffective Approaches to Attention-based Neural M...2015-08-17Code