TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Faithful Multimodal Explanation for Visual Question Answer...

Faithful Multimodal Explanation for Visual Question Answering

Jialin Wu, Raymond J. Mooney

2018-09-08WS 2019 8Question AnsweringExplanatory Visual Question AnsweringVisual Question Answering (VQA)Visual Question Answering
PaperPDFCode

Abstract

AI systems' ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have enabled significant progress on many challenging problems such as visual question answering (VQA). However, most of them are opaque black boxes with limited explanatory capability. This paper presents a novel approach to developing a high-performing VQA system that can elucidate its answers with integrated textual and visual explanations that faithfully reflect important aspects of its underlying reasoning while capturing the style of comprehensible human explanations. Extensive experimental evaluation demonstrates the advantages of this approach compared to competing methods with both automatic evaluation metrics and human evaluation metrics.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)GQA-REXBLEU-442.45EXP
Visual Question Answering (VQA)GQA-REXCIDEr357.1EXP
Visual Question Answering (VQA)GQA-REXGQA-test56.92EXP
Visual Question Answering (VQA)GQA-REXGQA-val65.17EXP
Visual Question Answering (VQA)GQA-REXGrounding33.52EXP
Visual Question Answering (VQA)GQA-REXMETEOR34.46EXP
Visual Question Answering (VQA)GQA-REXROUGE-L73.51EXP
Visual Question Answering (VQA)GQA-REXSPICE40.35EXP
Visual Question AnsweringGQA-REXBLEU-442.45EXP
Visual Question AnsweringGQA-REXCIDEr357.1EXP
Visual Question AnsweringGQA-REXGQA-test56.92EXP
Visual Question AnsweringGQA-REXGQA-val65.17EXP
Visual Question AnsweringGQA-REXGrounding33.52EXP
Visual Question AnsweringGQA-REXMETEOR34.46EXP
Visual Question AnsweringGQA-REXROUGE-L73.51EXP
Visual Question AnsweringGQA-REXSPICE40.35EXP
Explanatory Visual Question AnsweringGQA-REXBLEU-442.45EXP
Explanatory Visual Question AnsweringGQA-REXCIDEr357.1EXP
Explanatory Visual Question AnsweringGQA-REXGQA-test56.92EXP
Explanatory Visual Question AnsweringGQA-REXGQA-val65.17EXP
Explanatory Visual Question AnsweringGQA-REXGrounding33.52EXP
Explanatory Visual Question AnsweringGQA-REXMETEOR34.46EXP
Explanatory Visual Question AnsweringGQA-REXROUGE-L73.51EXP
Explanatory Visual Question AnsweringGQA-REXSPICE40.35EXP

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16