Visual Dialog

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra

2016-11-26CVPR 2017 7AI Agent Visual Dialog Chatbot Retrieval

Paper PDF Code Code Code Code Code(official)Code Code Code Code Code Code

Abstract

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and contains 1 dialog with 10 question-answer pairs on ~120k images from COCO, with a total of ~1.2M dialog question-answer pairs. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders -- Late Fusion, Hierarchical Recurrent Encoder and Memory Network -- and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Putting it all together, we demonstrate the first 'visual chatbot'! Our dataset, code, trained models and visual chatbot are available on https://visualdialog.org

Results

Task	Dataset	Metric	Value	Model
Dialogue	VisDial v0.9 val	MRR	0.5965	MN-QIH-D
Dialogue	VisDial v0.9 val	Mean Rank	5.46	MN-QIH-D
Dialogue	VisDial v0.9 val	R@1	45.55	MN-QIH-D
Dialogue	VisDial v0.9 val	R@10	85.37	MN-QIH-D
Dialogue	VisDial v0.9 val	R@5	76.22	MN-QIH-D
Dialogue	VisDial v0.9 val	MRR	0.5846	HRE-QIH-D
Dialogue	VisDial v0.9 val	Mean Rank	5.72	HRE-QIH-D
Dialogue	VisDial v0.9 val	R@1	44.67	HRE-QIH-D
Dialogue	VisDial v0.9 val	R@10	84.22	HRE-QIH-D
Dialogue	VisDial v0.9 val	R@5	74.5	HRE-QIH-D
Dialogue	VisDial v0.9 val	MRR	0.5807	HRE-QIH-D
Dialogue	VisDial v0.9 val	Mean Rank	5.78	HRE-QIH-D
Dialogue	VisDial v0.9 val	R@1	43.82	HRE-QIH-D
Dialogue	VisDial v0.9 val	R@10	84.07	HRE-QIH-D
Dialogue	VisDial v0.9 val	R@5	74.68	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	MRR (x 100)	55.5	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	Mean	5.92	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	NDCG (x 100)	47.5	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@1	40.98	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@10	83.3	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@5	72.3	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	MRR (x 100)	54.2	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	Mean	6.41	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	NDCG (x 100)	45.5	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@1	39.93	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@10	81.5	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@5	70.45	HRE-QIH-D
Dialogue	Visual Dialog v1.0 test-std	MRR (x 100)	55.4	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	Mean	5.95	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	NDCG (x 100)	45.3	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@1	40.95	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@10	82.83	MN-QIH-D
Dialogue	Visual Dialog v1.0 test-std	R@5	72.45	MN-QIH-D
Visual Dialog	VisDial v0.9 val	MRR	0.5965	MN-QIH-D
Visual Dialog	VisDial v0.9 val	Mean Rank	5.46	MN-QIH-D
Visual Dialog	VisDial v0.9 val	R@1	45.55	MN-QIH-D
Visual Dialog	VisDial v0.9 val	R@10	85.37	MN-QIH-D
Visual Dialog	VisDial v0.9 val	R@5	76.22	MN-QIH-D
Visual Dialog	VisDial v0.9 val	MRR	0.5846	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	Mean Rank	5.72	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	R@1	44.67	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	R@10	84.22	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	R@5	74.5	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	MRR	0.5807	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	Mean Rank	5.78	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	R@1	43.82	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	R@10	84.07	HRE-QIH-D
Visual Dialog	VisDial v0.9 val	R@5	74.68	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	MRR (x 100)	55.5	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	Mean	5.92	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	NDCG (x 100)	47.5	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@1	40.98	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@10	83.3	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@5	72.3	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	MRR (x 100)	54.2	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	Mean	6.41	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	NDCG (x 100)	45.5	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@1	39.93	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@10	81.5	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@5	70.45	HRE-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	MRR (x 100)	55.4	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	Mean	5.95	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	NDCG (x 100)	45.3	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@1	40.95	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@10	82.83	MN-QIH-D
Visual Dialog	Visual Dialog v1.0 test-std	R@5	72.45	MN-QIH-D

Visual Dialog

Abstract

Results

Related Papers

Visual Dialog

Abstract

Results

Related Papers