TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Iterative Context-Aware Graph Inference for Visual Dialog

Iterative Context-Aware Graph Inference for Visual Dialog

Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang

2020-04-05CVPR 2020 6Visual DialogGraph EmbeddingGraph Attention
PaperPDFCode

Abstract

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with sparse contexts and unknown graph structure (relation descriptor), and how to model the underlying context-aware relation inference is critical. To this end, we propose a novel Context-Aware Graph (CAG) neural network. Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations. The graph structure (relations in dialog) is iteratively updated using an adaptive top-$K$ message passing mechanism. Specifically, in every message passing step, each node selects the most $K$ relevant nodes, and only receives messages from them. Then, after the update, we impose graph attention on all the nodes to get the final graph embedding and infer the answer. In CAG, each node has dynamic relations in the graph (different related $K$ neighbor nodes), and only the most relevant nodes are attributive to the context-aware relational graph inference. Experimental results on VisDial v0.9 and v1.0 datasets show that CAG outperforms comparative methods. Visualization results further validate the interpretability of our method.

Results

TaskDatasetMetricValueModel
DialogueVisDial v0.9 valMRR0.6756CAG
DialogueVisDial v0.9 valMean Rank3.75CAG
DialogueVisDial v0.9 valR@154.64CAG
DialogueVisDial v0.9 valR@1091.48CAG
DialogueVisDial v0.9 valR@583.72CAG
DialogueVisual Dialog v1.0 test-stdMRR (x 100)63.49CAG
DialogueVisual Dialog v1.0 test-stdMean4.11CAG
DialogueVisual Dialog v1.0 test-stdNDCG (x 100)56.64CAG
DialogueVisual Dialog v1.0 test-stdR@149.85CAG
DialogueVisual Dialog v1.0 test-stdR@1090.15CAG
DialogueVisual Dialog v1.0 test-stdR@580.63CAG
Visual DialogVisDial v0.9 valMRR0.6756CAG
Visual DialogVisDial v0.9 valMean Rank3.75CAG
Visual DialogVisDial v0.9 valR@154.64CAG
Visual DialogVisDial v0.9 valR@1091.48CAG
Visual DialogVisDial v0.9 valR@583.72CAG
Visual DialogVisual Dialog v1.0 test-stdMRR (x 100)63.49CAG
Visual DialogVisual Dialog v1.0 test-stdMean4.11CAG
Visual DialogVisual Dialog v1.0 test-stdNDCG (x 100)56.64CAG
Visual DialogVisual Dialog v1.0 test-stdR@149.85CAG
Visual DialogVisual Dialog v1.0 test-stdR@1090.15CAG
Visual DialogVisual Dialog v1.0 test-stdR@580.63CAG

Related Papers

SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs2025-07-17Catching Bid-rigging Cartels with Graph Attention Neural Networks2025-07-16Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting2025-07-14Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence2025-07-02Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection2025-06-26Metapath-based Hyperbolic Contrastive Learning for Heterogeneous Graph Embedding2025-06-20Accessible Gesture-Driven Augmented Reality Interaction System2025-06-18AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection2025-06-17