TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CLEVR-X: A Visual Reasoning Dataset for Natural Language E...

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

2022-04-05Question AnsweringText GenerationExplanation GenerationVisual ReasoningVisual Question Answering (VQA)Visual Question Answering
PaperPDFCode(official)

Abstract

Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at \url{https://explainableml.github.io/CLEVR-X/}.

Results

TaskDatasetMetricValueModel
Explanation GenerationCLEVR-XAcc63PJ-X
Explanation GenerationCLEVR-XB487.4PJ-X
Explanation GenerationCLEVR-XC639.8PJ-X
Explanation GenerationCLEVR-XM58.9PJ-X
Explanation GenerationCLEVR-XRL93.4PJ-X
Explanation GenerationCLEVR-XAcc80.3FM
Explanation GenerationCLEVR-XB478.8FM
Explanation GenerationCLEVR-XC566.8FM
Explanation GenerationCLEVR-XM52.5FM
Explanation GenerationCLEVR-XRL85.8FM

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16