TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Survey on Interpretable Cross-modal Reasoning

A Survey on Interpretable Cross-modal Reasoning

Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu

2023-09-05Cross-Modal RetrievalFactual Visual Question AnsweringExplanation GenerationDecision MakingScience Question AnsweringVisual ReasoningPhrase GroundingFake News DetectionVisual Commonsense ReasoningVisual Question AnsweringImage-guided Story Ending Generation
PaperPDFCode(official)

Abstract

In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities. The summarized methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.

Related Papers

Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18Higher-Order Pattern Unification Modulo Similarity Relations2025-07-17Exploiting Constraint Reasoning to Build Graphical Explanations for Mixed-Integer Linear Programming2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE+UPOM2025-07-15CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking2025-07-15Detección y Cuantificación de Erosión Fluvial con Visión Artificial2025-07-15