A Convolutional Encoder Model for Neural Machine Translation

Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin

2016-11-07ACL 2017 7Machine Translation Translation

Abstract

The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. In this paper we present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the entire source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 English-Romanian translation we achieve competitive accuracy to the state-of-the-art and we outperform several recently published results on the WMT'15 English-German task. Our models obtain almost the same accuracy as a very deep LSTM setup on WMT'14 English-French translation. Our convolutional encoder speeds up CPU decoding by more than two times at the same or higher accuracy as a strong bi-directional LSTM baseline.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2015 German-English	BLEU score	30.4	Conv-LSTM (deep+pos)
Machine Translation	WMT2016 English-Romanian	BLEU score	27.8	Deep Convolutional Encoder; single-layer decoder
Machine Translation	WMT2016 English-Romanian	BLEU score	27.5	BiLSTM
Machine Translation	WMT2014 English-French	BLEU score	35.7	Deep Convolutional Encoder; single-layer decoder

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 Unconditional Diffusion for Generative Sequential Recommendation2025-07-08 GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04 TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01 CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29