TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MultiSubs: A Large-scale Multimodal and Multilingual Dataset

MultiSubs: A Large-scale Multimodal and Multilingual Dataset

Josiah Wang, Pranava Madhyastha, Josiel Figueiredo, Chiraag Lala, Lucia Specia

2021-03-02LREC 2022 6Multimodal Text PredictionTranslationMultimodal Lexical Translation
PaperPDFCode(official)

Abstract

This paper introduces a large-scale multimodal and multilingual dataset that aims to facilitate research on grounding words to images in their contextual usage in language. The dataset consists of images selected to unambiguously illustrate concepts expressed in sentences from movie subtitles. The dataset is a valuable resource as (i) the images are aligned to text fragments rather than whole sentences; (ii) multiple images are possible for a text fragment and a sentence; (iii) the sentences are free-form and real-world like; (iv) the parallel texts are multilingual. We set up a fill-in-the-blank game for humans to evaluate the quality of the automatic image selection process of our dataset. We show the utility of the dataset on two automatic tasks: (i) fill-in-the-blank; (ii) lexical translation. Results of the human evaluation and automatic models demonstrate that images can be a useful complement to the textual context. The dataset will benefit research on visual grounding of words especially in the context of free-form sentences, and can be obtained from https://doi.org/10.5281/zenodo.5034604 under a Creative Commons licence.

Results

TaskDatasetMetricValueModel
Machine TranslationMultiSubs English-SpanishALI0.81Multimodal BRNN
Machine TranslationMultiSubs English-GermanALI0.94Multimodal BRNN
Machine TranslationMultiSubs English-FrenchALI0.81Multimodal BRNN
Machine TranslationMultiSubs English-PortugueseALI0.8Multimodal BRNN
Multimodal Machine TranslationMultiSubs English-SpanishALI0.81Multimodal BRNN
Multimodal Machine TranslationMultiSubs English-GermanALI0.94Multimodal BRNN
Multimodal Machine TranslationMultiSubs English-FrenchALI0.81Multimodal BRNN
Multimodal Machine TranslationMultiSubs English-PortugueseALI0.8Multimodal BRNN
Multimodal Text PredictionMultiSubsAccuracy30.359-gram LM with back-off
Multimodal Text PredictionMultiSubsWord similarity0.449-gram LM with back-off

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29