TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Factor Graph Attention

Factor Graph Attention

Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander Schwing

2019-04-11CVPR 2019 6Question AnsweringVisual DialogVisual Question Answering (VQA)Visual Question AnsweringGraph Attention
PaperPDFCode(official)

Abstract

Dialog is an effective way to exchange information, but subtle details and nuances are extremely important. While significant progress has paved a path to address visual dialog with algorithms, details and nuances remain a challenge. Attention mechanisms have demonstrated compelling results to extract details in visual question answering and also provide a convincing framework for visual dialog due to their interpretability and effectiveness. However, the many data utilities that accompany visual dialog challenge existing attention techniques. We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities. To this end, we design a factor graph based attention mechanism which combines any number of utility representations. We illustrate the applicability of the proposed approach on the challenging and recently introduced VisDial datasets, outperforming recent state-of-the-art methods by 1.1% for VisDial0.9 and by 2% for VisDial1.0 on MRR. Our ensemble model improved the MRR score on VisDial1.0 by more than 6%.

Results

TaskDatasetMetricValueModel
DialogueVisDial v0.9 valMRR68.929xFGA (VGG)
DialogueVisDial v0.9 valMean Rank3.399xFGA (VGG)
DialogueVisDial v0.9 valR@155.169xFGA (VGG)
DialogueVisDial v0.9 valR@1092.959xFGA (VGG)
DialogueVisDial v0.9 valR@586.269xFGA (VGG)
DialogueVisual Dialog v1.0 test-stdMRR (x 100)69.35xFGA (F-RCNNx101)
DialogueVisual Dialog v1.0 test-stdMean3.145xFGA (F-RCNNx101)
DialogueVisual Dialog v1.0 test-stdNDCG (x 100)57.25xFGA (F-RCNNx101)
DialogueVisual Dialog v1.0 test-stdR@155.655xFGA (F-RCNNx101)
DialogueVisual Dialog v1.0 test-stdR@1094.055xFGA (F-RCNNx101)
DialogueVisual Dialog v1.0 test-stdR@586.735xFGA (F-RCNNx101)
Visual DialogVisDial v0.9 valMRR68.929xFGA (VGG)
Visual DialogVisDial v0.9 valMean Rank3.399xFGA (VGG)
Visual DialogVisDial v0.9 valR@155.169xFGA (VGG)
Visual DialogVisDial v0.9 valR@1092.959xFGA (VGG)
Visual DialogVisDial v0.9 valR@586.269xFGA (VGG)
Visual DialogVisual Dialog v1.0 test-stdMRR (x 100)69.35xFGA (F-RCNNx101)
Visual DialogVisual Dialog v1.0 test-stdMean3.145xFGA (F-RCNNx101)
Visual DialogVisual Dialog v1.0 test-stdNDCG (x 100)57.25xFGA (F-RCNNx101)
Visual DialogVisual Dialog v1.0 test-stdR@155.655xFGA (F-RCNNx101)
Visual DialogVisual Dialog v1.0 test-stdR@1094.055xFGA (F-RCNNx101)
Visual DialogVisual Dialog v1.0 test-stdR@586.735xFGA (F-RCNNx101)

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16