Gumbel-Attention for Multi-modal Machine Translation

Pengbo Liu, Hailong Cao, Tiejun Zhao

2021-03-16Machine Translation Multimodal Machine Translation Translation

Abstract

Multi-modal machine translation (MMT) improves translation quality by introducing visual information. However, the existing MMT model ignores the problem that the image will bring information irrelevant to the text, causing much noise to the model and affecting the translation quality. This paper proposes a novel Gumbel-Attention for multi-modal machine translation, which selects the text-related parts of the image features. Specifically, different from the previous attention-based method, we first use a differentiable method to select the image information and automatically remove the useless parts of the image features. Experiments prove that our method retains the image features related to the text, and the remaining parts help the MMT model generates better translations.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	Multi30K	BLEU (EN-DE)	39.2	Gumbel-Attention MMT
Machine Translation	Multi30K	Meteor (EN-DE)	57.8	Gumbel-Attention MMT
Multimodal Machine Translation	Multi30K	BLEU (EN-DE)	39.2	Gumbel-Attention MMT
Multimodal Machine Translation	Multi30K	Meteor (EN-DE)	57.8	Gumbel-Attention MMT

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 Unconditional Diffusion for Generative Sequential Recommendation2025-07-08 GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04 TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01 CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29