TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/The Evolved Transformer

The Evolved Transformer

David R. So, Chen Liang, Quoc V. Le

2019-01-30Machine TranslationNeural Architecture Search
PaperPDFCodeCode(official)Code

Abstract

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.

Results

TaskDatasetMetricValueModel
Machine TranslationWMT2014 English-CzechBLEU score28.2Evolved Transformer Big
Machine TranslationWMT2014 English-CzechBLEU score27.6Evolved Transformer Base
Machine TranslationWMT2014 English-GermanBLEU score29.8Evolved Transformer Big
Machine TranslationWMT2014 English-GermanSacreBLEU29.2Evolved Transformer Big
Machine TranslationWMT2014 English-GermanBLEU score28.4Evolved Transformer Base
Machine TranslationWMT2014 English-FrenchBLEU score41.3Evolved Transformer Big
Machine TranslationWMT2014 English-FrenchBLEU score40.6Evolved Transformer Base
Language ModellingOne Billion WordPPL28.6Evolved Transformer Big

Related Papers

DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval2025-06-26Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation2025-06-24