Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen

2022-11-20Story Visualization Story Continuation

Abstract

Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Recently, most works focus on synthesizing independent images; While for real-world applications, it is common and necessary to generate a series of coherent images for story-stelling. In this work, we mainly focus on story visualization and continuation tasks and propose AR-LDM, a latent diffusion model auto-regressively conditioned on history captions and generated images. Moreover, AR-LDM can generalize to new characters through adaptation. To our best knowledge, this is the first work successfully leveraging diffusion models for coherent visual story synthesizing. Quantitative results show that AR-LDM achieves SoTA FID scores on PororoSV, FlintstonesSV, and the newly introduced challenging dataset VIST containing natural images. Large-scale human evaluations show that AR-LDM has superior performance in terms of quality, relevance, and consistency.

Results

Task	Dataset	Metric	Value	Model
Text-To-Image	Pororo	FID	16.59	AR-LDM
Story Continuation	PororoSV	FID	17.4	AR-LDM
Story Continuation	FlintstonesSV	FID	19.28	AR-LDM
Story Continuation	VIST	FID	16.95	AR-LDM (SIS captions)
Story Continuation	VIST	FID	17.03	AR-LDM (DII captions)

Related Papers

Consistent Story Generation with Asymmetry Zigzag Sampling2025-06-11 ViStoryBench: Comprehensive Benchmark Suite for Story Visualization2025-05-30 Object Isolated Attention for Consistent Story Visualization2025-03-30 Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling2024-12-30 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation2024-12-10 StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization2024-12-10 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization2024-10-08 DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion2024-07-17