TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Reasoning Visual Dialogs with Structural and Partial Obser...

Reasoning Visual Dialogs with Structural and Partial Observations

Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu

2019-04-11CVPR 2019 6Visual Dialog
PaperPDFCode(official)

Abstract

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.

Results

TaskDatasetMetricValueModel
DialogueVisDial v0.9 valMRR0.6285GNN
DialogueVisDial v0.9 valMean Rank4.57GNN
DialogueVisDial v0.9 valR@148.95GNN
DialogueVisDial v0.9 valR@1088.36GNN
DialogueVisDial v0.9 valR@579.65GNN
DialogueVisual Dialog v1.0 test-stdMRR (x 100)61.37GNN
DialogueVisual Dialog v1.0 test-stdMean4.57GNN
DialogueVisual Dialog v1.0 test-stdNDCG (x 100)52.82GNN
DialogueVisual Dialog v1.0 test-stdR@147.33GNN
DialogueVisual Dialog v1.0 test-stdR@1087.83GNN
DialogueVisual Dialog v1.0 test-stdR@577.98GNN
Visual DialogVisDial v0.9 valMRR0.6285GNN
Visual DialogVisDial v0.9 valMean Rank4.57GNN
Visual DialogVisDial v0.9 valR@148.95GNN
Visual DialogVisDial v0.9 valR@1088.36GNN
Visual DialogVisDial v0.9 valR@579.65GNN
Visual DialogVisual Dialog v1.0 test-stdMRR (x 100)61.37GNN
Visual DialogVisual Dialog v1.0 test-stdMean4.57GNN
Visual DialogVisual Dialog v1.0 test-stdNDCG (x 100)52.82GNN
Visual DialogVisual Dialog v1.0 test-stdR@147.33GNN
Visual DialogVisual Dialog v1.0 test-stdR@1087.83GNN
Visual DialogVisual Dialog v1.0 test-stdR@577.98GNN

Related Papers

V$^2$Dial: Unification of Video and Visual Dialog via Multimodal Experts2025-03-03V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts2025-01-01Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations2024-08-13ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report2024-07-13Hawk: Learning to Understand Open-World Video Anomalies2024-05-27Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models2024-03-27FlexCap: Describe Anything in Images in Controllable Detail2024-03-18$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal $\mathbb{GR}$aphs2023-10-25