TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Very Deep Transformers for Neural Machine Translation

Very Deep Transformers for Neural Machine Translation

Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

2020-08-18Machine TranslationNMTTranslation
PaperPDFCode(official)CodeCodeCode

Abstract

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.

Results

TaskDatasetMetricValueModel
Machine TranslationWMT2014 English-GermanBLEU score30.1Transformer (ADMIN init)
Machine TranslationWMT2014 English-GermanSacreBLEU29.5Transformer (ADMIN init)
Machine TranslationWMT2014 English-FrenchBLEU score46.4Transformer+BT (ADMIN init)
Machine TranslationWMT2014 English-FrenchSacreBLEU44.4Transformer+BT (ADMIN init)
Machine TranslationWMT2014 English-FrenchBLEU score43.8Transformer (ADMIN init)
Machine TranslationWMT2014 English-FrenchSacreBLEU41.8Transformer (ADMIN init)

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29