TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Machine Translation/IWSLT2014 German-English

Machine Translation on IWSLT2014 German-English

Metric: BLEU score (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕BLEU score▼Extra DataPaperDate↕Code
1PiNMT40.43NoIntegrating Pre-trained Language Model into Neur...2023-10-30-
2BiBERT38.61NoBERT, mBERT, or BiBERT? A Study on Contextualize...2021-09-09Code
3Bi-SimCut38.37NoBi-SimCut: A Simple Strategy for Boosting Neural...2022-06-06Code
4Cutoff + Relaxed Attention + LM37.96YesRelaxed Attention for Transformer Models2022-09-20Code
5DRDA37.95NoDeterministic Reversible Data Augmentation for N...2024-06-04Code
6Transformer + R-Drop + Cutoff37.9NoR-Drop: Regularized Dropout for Neural Networks2021-06-28Code
7SimCut37.81NoBi-SimCut: A Simple Strategy for Boosting Neural...2022-06-06Code
8Cutoff+Knee37.78NoWide-minima Density Hypothesis and the Explore-E...2020-03-09Code
9Cutoff37.6NoA Simple but Tough-to-Beat Data Augmentation App...2020-09-29Code
10CipherDAug37.53NoCipherDAug: Ciphertext based Data Augmentation f...2022-04-01Code
11Transformer + R-Drop37.25NoR-Drop: Regularized Dropout for Neural Networks2021-06-28Code
12Data Diversification37.2NoData Diversification: A Simple Strategy For Neur...2019-11-05Code
13UniDrop36.88NoUniDrop: A Simple yet Effective Technique to Imp...2021-04-11-
14MixedRepresentations36.41No--Code
15Mask Attention Network (small)36.3NoMask Attention Networks: Rethinking and Strength...2021-03-25Code
16MUSE(Parallel Multi-scale Attention)36.3NoMUSE: Parallel Multi-Scale Attention for Sequenc...2019-11-17Code
17Transformer+Rep(Sim)+WDrop36.22NoRethinking Perturbations in Encoder-Decoders for...2021-04-05Code
18MAT36.22NoMulti-branch Attentive Transformer2020-06-18Code
19TransformerBase + AutoDropout35.8NoAutoDropout: Learning Dropout Patterns to Regula...2021-01-05Code
20Local Joint Self-attention35.7NoJoint Source-Target Self Attention with Locality...2019-05-16Code
21TaLK Convolutions35.5NoTime-aware Large Kernel Convolutions2020-02-08Code
22ImitKD + Full35.4NoAutoregressive Knowledge Distillation through Im...2020-09-15Code
23DeLighT35.3NoDeLighT: Deep and Light-weight Transformer2020-08-03Code
24DynamicConv35.2NoPay Less Attention with Lightweight and Dynamic ...2019-01-29Code
25Transformer35.1385NoGuidelines for the Regularization of Gammas in B...2022-05-15-
26LightConv34.8NoPay Less Attention with Lightweight and Dynamic ...2019-01-29Code
27Transformer34.44NoAttention Is All You Need2017-06-12Code
28Rfa-Gate-arccos34.4NoRandom Feature Attention2021-03-03-
29Variational Attention33.1NoLatent Alignment and Variational Attention2018-07-10Code
30Minimum Risk Training [Edunov2017]32.84NoClassical Structured Prediction Losses for Seque...2017-11-14Code
31CNAT31.15NoNon-Autoregressive Translation by Learning Targe...2021-03-21Code
32Neural PBMT + LM [Huang2018]30.08NoTowards Neural Phrase-based Machine Translation2017-06-17Code
33Back-Translation Finetuning28.83YesTag-less Back-Translation2019-12-22-
34Actor-Critic [Bahdanau2017]28.53NoAn Actor-Critic Algorithm for Sequence Prediction2016-07-24Code