Informative Visual Storytelling with Cross-modal Rules

Jiacheng Li, Haizhou Shi, Siliang Tang, Fei Wu, Yueting Zhuang

2019-07-07Story Generation Visual Storytelling

Abstract

Existing methods in the Visual Storytelling field often suffer from the problem of generating general descriptions, while the image contains a lot of meaningful contents remaining unnoticed. The failure of informative story generation can be concluded to the model's incompetence of capturing enough meaningful concepts. The categories of these concepts include entities, attributes, actions, and events, which are in some cases crucial to grounded storytelling. To solve this problem, we propose a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input. We first build the multimodal transactions by concatenating the CNN activations and the word indices. Then we use the association rule mining algorithm to mine the cross-modal rules, which will be used for the concept inference. With the help of the cross-modal rules, the generated stories are more grounded and informative. Besides, our proposed method holds the advantages of interpretation, expandability, and transferability, indicating potential for wider application. Finally, we leverage these concepts in our encoder-decoder framework with the attention mechanism. We conduct several experiments on the VIsual StoryTelling~(VIST) dataset, the results of which demonstrate the effectiveness of our approach in terms of both automatic metrics and human evaluation. Additional experiments are also conducted showing that our mined cross-modal rules as additional knowledge helps the model gain better performance when trained on a small dataset.

Results

Task	Dataset	Metric	Value	Model
Text Generation	VIST	BLEU-1	63.8	VSCMR
Text Generation	VIST	BLEU-4	14.3	VSCMR
Text Generation	VIST	CIDEr	9	VSCMR
Text Generation	VIST	METEOR	35.5	VSCMR
Text Generation	VIST	ROUGE-L	30.2	VSCMR
Data-to-Text Generation	VIST	BLEU-1	63.8	VSCMR
Data-to-Text Generation	VIST	BLEU-4	14.3	VSCMR
Data-to-Text Generation	VIST	CIDEr	9	VSCMR
Data-to-Text Generation	VIST	METEOR	35.5	VSCMR
Data-to-Text Generation	VIST	ROUGE-L	30.2	VSCMR
Visual Storytelling	VIST	BLEU-1	63.8	VSCMR
Visual Storytelling	VIST	BLEU-4	14.3	VSCMR
Visual Storytelling	VIST	CIDEr	9	VSCMR
Visual Storytelling	VIST	METEOR	35.5	VSCMR
Visual Storytelling	VIST	ROUGE-L	30.2	VSCMR
Story Generation	VIST	BLEU-1	63.8	VSCMR
Story Generation	VIST	BLEU-4	14.3	VSCMR
Story Generation	VIST	CIDEr	9	VSCMR
Story Generation	VIST	METEOR	35.5	VSCMR
Story Generation	VIST	ROUGE-L	30.2	VSCMR

Abstract

Results

Task	Dataset	Metric	Value	Model
Text Generation	VIST	BLEU-1	63.8	VSCMR
Text Generation	VIST	BLEU-4	14.3	VSCMR
Text Generation	VIST	CIDEr	9	VSCMR
Text Generation	VIST	METEOR	35.5	VSCMR
Text Generation	VIST	ROUGE-L	30.2	VSCMR
Data-to-Text Generation	VIST	BLEU-1	63.8	VSCMR
Data-to-Text Generation	VIST	BLEU-4	14.3	VSCMR
Data-to-Text Generation	VIST	CIDEr	9	VSCMR
Data-to-Text Generation	VIST	METEOR	35.5	VSCMR
Data-to-Text Generation	VIST	ROUGE-L	30.2	VSCMR
Visual Storytelling	VIST	BLEU-1	63.8	VSCMR
Visual Storytelling	VIST	BLEU-4	14.3	VSCMR
Visual Storytelling	VIST	CIDEr	9	VSCMR
Visual Storytelling	VIST	METEOR	35.5	VSCMR
Visual Storytelling	VIST	ROUGE-L	30.2	VSCMR
Story Generation	VIST	BLEU-1	63.8	VSCMR
Story Generation	VIST	BLEU-4	14.3	VSCMR
Story Generation	VIST	CIDEr	9	VSCMR
Story Generation	VIST	METEOR	35.5	VSCMR
Story Generation	VIST	ROUGE-L	30.2	VSCMR

Informative Visual Storytelling with Cross-modal Rules

Abstract

Results

Related Papers

Informative Visual Storytelling with Cross-modal Rules

Abstract

Results

Related Papers