Joint Source-Target Self Attention with Locality Constraints

José A. R. Fonollosa, Noe Casas, Marta R. Costa-jussà

2019-05-16Machine Translation Translation Language Modelling

Abstract

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT'14 German-English and matches the best reported results in the literature on the WMT'14 English-German and WMT'14 English-French translation benchmarks.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2014 German-English	BLEU score	35.7	Local Joint Self-attention
Machine Translation	WMT2014 English-German	BLEU score	29.7	Local Joint Self-attention
Machine Translation	WMT2014 English-French	BLEU score	43.3	Local Joint Self-attention

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Assay2Mol: large language model-based drug design using BioAssay context2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16