TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/REX: Reasoning-aware and Grounded Explanation

REX: Reasoning-aware and Grounded Explanation

Shi Chen, Qi Zhao

2022-03-11CVPR 2022 1Visual GroundingExplanatory Visual Question AnsweringExplanation GenerationTransfer LearningDecision MakingVisual ReasoningMulti-Task LearningFS-MEVQA
PaperPDFCode(official)

Abstract

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially execute different reasoning steps and construct a new dataset with 1,040,830 multi-modal explanations. Second, we identify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions, and propose a novel explanation generation method that explicitly models the pairwise correspondence between words and regions of interest. It improves the visual grounding capability by a considerable margin, resulting in enhanced interpretability and reasoning performance. Finally, with our new data and method, we perform extensive analyses to study the effectiveness of our explanation under different settings, including multi-task learning and transfer learning. Our code and data are available at https://github.com/szzexpoi/rex.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)GQA-REXBLEU-454.79REX-LXMERT
Visual Question Answering (VQA)GQA-REXCIDEr466.01REX-LXMERT
Visual Question Answering (VQA)GQA-REXGQA-test58.15REX-LXMERT
Visual Question Answering (VQA)GQA-REXGQA-val78.19REX-LXMERT
Visual Question Answering (VQA)GQA-REXGrounding70.79REX-LXMERT
Visual Question Answering (VQA)GQA-REXMETEOR39.51REX-LXMERT
Visual Question Answering (VQA)GQA-REXROUGE-L79.41REX-LXMERT
Visual Question Answering (VQA)GQA-REXSPICE49.98REX-LXMERT
Visual Question Answering (VQA)GQA-REXBLEU-454.59REX-VisualBert
Visual Question Answering (VQA)GQA-REXCIDEr464.2REX-VisualBert
Visual Question Answering (VQA)GQA-REXGQA-test57.77REX-VisualBert
Visual Question Answering (VQA)GQA-REXGQA-val66.16REX-VisualBert
Visual Question Answering (VQA)GQA-REXGrounding67.95REX-VisualBert
Visual Question Answering (VQA)GQA-REXMETEOR39.22REX-VisualBert
Visual Question Answering (VQA)GQA-REXROUGE-L78.56REX-VisualBert
Visual Question Answering (VQA)GQA-REXSPICE46.8REX-VisualBert
Visual Question Answering (VQA)SME#Learning Samples (N)16REX
Visual Question Answering (VQA)SMEACC17.77REX
Visual Question Answering (VQA)SMECIDEr0.89REX
Visual Question Answering (VQA)SMEMETEOR4.37REX
Visual Question Answering (VQA)SMEROUGE-L23.23REX
Visual Question AnsweringGQA-REXBLEU-454.79REX-LXMERT
Visual Question AnsweringGQA-REXCIDEr466.01REX-LXMERT
Visual Question AnsweringGQA-REXGQA-test58.15REX-LXMERT
Visual Question AnsweringGQA-REXGQA-val78.19REX-LXMERT
Visual Question AnsweringGQA-REXGrounding70.79REX-LXMERT
Visual Question AnsweringGQA-REXMETEOR39.51REX-LXMERT
Visual Question AnsweringGQA-REXROUGE-L79.41REX-LXMERT
Visual Question AnsweringGQA-REXSPICE49.98REX-LXMERT
Visual Question AnsweringGQA-REXBLEU-454.59REX-VisualBert
Visual Question AnsweringGQA-REXCIDEr464.2REX-VisualBert
Visual Question AnsweringGQA-REXGQA-test57.77REX-VisualBert
Visual Question AnsweringGQA-REXGQA-val66.16REX-VisualBert
Visual Question AnsweringGQA-REXGrounding67.95REX-VisualBert
Visual Question AnsweringGQA-REXMETEOR39.22REX-VisualBert
Visual Question AnsweringGQA-REXROUGE-L78.56REX-VisualBert
Visual Question AnsweringGQA-REXSPICE46.8REX-VisualBert
Visual Question AnsweringSME#Learning Samples (N)16REX
Visual Question AnsweringSMEACC17.77REX
Visual Question AnsweringSMECIDEr0.89REX
Visual Question AnsweringSMEMETEOR4.37REX
Visual Question AnsweringSMEROUGE-L23.23REX
Explanatory Visual Question AnsweringGQA-REXBLEU-454.79REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXCIDEr466.01REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXGQA-test58.15REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXGQA-val78.19REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXGrounding70.79REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXMETEOR39.51REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXROUGE-L79.41REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXSPICE49.98REX-LXMERT
Explanatory Visual Question AnsweringGQA-REXBLEU-454.59REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXCIDEr464.2REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXGQA-test57.77REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXGQA-val66.16REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXGrounding67.95REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXMETEOR39.22REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXROUGE-L78.56REX-VisualBert
Explanatory Visual Question AnsweringGQA-REXSPICE46.8REX-VisualBert
Explanatory Visual Question AnsweringSME#Learning Samples (N)16REX
Explanatory Visual Question AnsweringSMEACC17.77REX
Explanatory Visual Question AnsweringSMECIDEr0.89REX
Explanatory Visual Question AnsweringSMEMETEOR4.37REX
Explanatory Visual Question AnsweringSMEROUGE-L23.23REX

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17Higher-Order Pattern Unification Modulo Similarity Relations2025-07-17Exploiting Constraint Reasoning to Build Graphical Explanations for Mixed-Integer Linear Programming2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16