TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/VisDial

VisDial

Visual Dialog

DialogImagesTextsCreative Commons Attribution 4.0 International LicenseIntroduced 2017-01-01

Visual Dialog (VisDial) dataset contains human annotated questions based on images of MS COCO dataset. This dataset was developed by pairing two subjects on Amazon Mechanical Turk to chat about an image. One person was assigned the job of a ‘questioner’ and the other person acted as an ‘answerer’. The questioner sees only the text description of an image (i.e., an image caption from MS COCO dataset) and the original image remains hidden to the questioner. Their task is to ask questions about this hidden image to “imagine the scene better”. The answerer sees the image, caption and answers the questions asked by the questioner. The two of them can continue the conversation by asking and answering questions for 10 rounds at max.

VisDial v1.0 contains 123K dialogues on MS COCO (2017 training set) for training split, 2K dialogues with validation images for validation split and 8K dialogues on test set for test-standard set. The previously released v0.5 and v0.9 versions of VisDial dataset (corresponding to older splits of MS COCO) are considered deprecated.

Source: Granular Multimodal Attention Networks for Visual Dialog Image Source: https://arxiv.org/pdf/1611.08669.pdf

Benchmarks

Image Retrieval/Recall@10 on 1 roundsImage Retrieval/Recall@10 on 2 roundsImage Retrieval/Recall@10 on 3 roundsImage Retrieval/Hits@10 on 10 Round

Related Benchmarks

VisDial v0.9 val/Dialogue/MRRVisDial v0.9 val/Dialogue/Mean RankVisDial v0.9 val/Dialogue/R@1VisDial v0.9 val/Dialogue/R@10VisDial v0.9 val/Dialogue/R@5VisDial v0.9 val/Visual Dialog/MRRVisDial v0.9 val/Visual Dialog/Mean RankVisDial v0.9 val/Visual Dialog/R@1VisDial v0.9 val/Visual Dialog/R@10VisDial v0.9 val/Visual Dialog/R@5VisDial v1.0 test-std/Dialogue/MRRVisDial v1.0 test-std/Dialogue/Mean RankVisDial v1.0 test-std/Dialogue/NDCGVisDial v1.0 test-std/Dialogue/R@1VisDial v1.0 test-std/Dialogue/R@10VisDial v1.0 test-std/Dialogue/R@5VisDial v1.0 test-std/Visual Dialog/MRRVisDial v1.0 test-std/Visual Dialog/Mean RankVisDial v1.0 test-std/Visual Dialog/NDCGVisDial v1.0 test-std/Visual Dialog/R@1VisDial v1.0 test-std/Visual Dialog/R@10VisDial v1.0 test-std/Visual Dialog/R@5

Statistics

Papers
159
Benchmarks
4

Links

Homepage

Tasks

Chat-based Image RetrievalCommon Sense ReasoningImage RetrievalQuestion AnsweringVisual DialogVisual Question Answering (VQA)