TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Latent Variable Model for Multi-modal Translation

Latent Variable Model for Multi-modal Translation

Iacer Calixto, Miguel Rios, Wilker Aziz

2018-11-01ACL 2019 7Machine TranslationMultimodal Machine TranslationTranslationMulti-Task Learning
PaperPDFCode(official)

Abstract

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and K\'ad\'ar, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the minimum amount of information encoded in the latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).

Results

TaskDatasetMetricValueModel
Machine TranslationMulti30KBLEU (EN-DE)37.6VMMTF
Machine TranslationMulti30KMeteor (EN-DE)56VMMTF
Multimodal Machine TranslationMulti30KBLEU (EN-DE)37.6VMMTF
Multimodal Machine TranslationMulti30KMeteor (EN-DE)56VMMTF

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Robust-Multi-Task Gradient Boosting2025-07-15SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation2025-07-10Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08