TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Zero-shot Visual Question Answering using Knowledge Graph

Zero-shot Visual Question Answering using Knowledge Graph

Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan, Huajun Chen

2021-07-12Question AnsweringKnowledge GraphsVisual Question Answering (VQA)Visual Question Answering
PaperPDFCodeCode(official)

Abstract

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)F-VQAAccuracy88.49ZS-F-VQA
Visual Question Answering (VQA)F-VQAMR9.17ZS-F-VQA
Visual Question Answering (VQA)F-VQAMRR0.685ZS-F-VQA
Visual Question Answering (VQA)F-VQATop-1 Accuracy58.27ZS-F-VQA
Visual Question Answering (VQA)F-VQATop-3 Accuracy76.51ZS-F-VQA
Visual Question Answering (VQA)ZS-F-VQATop-1 Accuracy29.39SAN † - hard mask

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16