TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Differentiable Outlier Detection Enable Robust Deep Multim...

Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Zhu Wang, Sourav Medya, Sathya N. Ravi

2023-02-11Question AnsweringKnowledge GraphsImage-text RetrievalText RetrievalOutlier DetectionVisual ReasoningRetrievalVisual Question Answering (VQA)Language ModellingVisual Question Answering
PaperPDFCode(official)

Abstract

Often, deep network models are purely inductive during training and while performing inference on unseen data. Thus, when such models are used for predictions, it is well known that they often fail to capture the semantic information and implicit dependencies that exist among objects (or concepts) on a population level. Moreover, it is still unclear how domain or prior modal knowledge can be specified in a backpropagation friendly manner, especially in large-scale and noisy settings. In this work, we propose an end-to-end vision and language model incorporating explicit knowledge graphs. We also introduce an interactive out-of-distribution (OOD) layer using implicit network operator. The layer is used to filter noise that is brought by external knowledge base. In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval on different datasets. Our experiments show that it is possible to design models that perform similarly to state-of-art results but with significantly fewer samples and training time.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)VQA v2 test-devAccuracy76.8VK-OOD
Visual ReasoningNLVR2 DevAccuracy83.9VK-OOD
Visual Question AnsweringVQA v2 test-devAccuracy76.8VK-OOD

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17