Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyung-Su Kim, Sungjin Kim, In So Kweon

2020-02-03Image Captioning Visual Storytelling

Abstract

Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models. In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics. Finally, we qualitatively show the learned ability to interpolate storyline over visual gaps.

Results

Task	Dataset	Metric	Value	Model
Text Generation	VIST	BLEU-1	64.4	INet
Text Generation	VIST	BLEU-2	0.401	INet
Text Generation	VIST	BLEU-3	23.9	INet
Text Generation	VIST	BLEU-4	14.7	INet
Text Generation	VIST	CIDEr	10	INet
Text Generation	VIST	METEOR	35.6	INet
Text Generation	VIST	ROUGE-L	29.7	INet
Data-to-Text Generation	VIST	BLEU-1	64.4	INet
Data-to-Text Generation	VIST	BLEU-2	0.401	INet
Data-to-Text Generation	VIST	BLEU-3	23.9	INet
Data-to-Text Generation	VIST	BLEU-4	14.7	INet
Data-to-Text Generation	VIST	CIDEr	10	INet
Data-to-Text Generation	VIST	METEOR	35.6	INet
Data-to-Text Generation	VIST	ROUGE-L	29.7	INet
Visual Storytelling	VIST	BLEU-1	64.4	INet
Visual Storytelling	VIST	BLEU-2	0.401	INet
Visual Storytelling	VIST	BLEU-3	23.9	INet
Visual Storytelling	VIST	BLEU-4	14.7	INet
Visual Storytelling	VIST	CIDEr	10	INet
Visual Storytelling	VIST	METEOR	35.6	INet
Visual Storytelling	VIST	ROUGE-L	29.7	INet
Story Generation	VIST	BLEU-1	64.4	INet
Story Generation	VIST	BLEU-2	0.401	INet
Story Generation	VIST	BLEU-3	23.9	INet
Story Generation	VIST	BLEU-4	14.7	INet
Story Generation	VIST	CIDEr	10	INet
Story Generation	VIST	METEOR	35.6	INet
Story Generation	VIST	ROUGE-L	29.7	INet

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Abstract

Results

Related Papers

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Abstract

Results

Related Papers