TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Synthesizing Coherent Story with Auto-Regressive Latent Di...

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen

2022-11-20Story VisualizationStory Continuation
PaperPDFCode(official)

Abstract

Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Recently, most works focus on synthesizing independent images; While for real-world applications, it is common and necessary to generate a series of coherent images for story-stelling. In this work, we mainly focus on story visualization and continuation tasks and propose AR-LDM, a latent diffusion model auto-regressively conditioned on history captions and generated images. Moreover, AR-LDM can generalize to new characters through adaptation. To our best knowledge, this is the first work successfully leveraging diffusion models for coherent visual story synthesizing. Quantitative results show that AR-LDM achieves SoTA FID scores on PororoSV, FlintstonesSV, and the newly introduced challenging dataset VIST containing natural images. Large-scale human evaluations show that AR-LDM has superior performance in terms of quality, relevance, and consistency.

Results

TaskDatasetMetricValueModel
Text-To-ImagePororoFID16.59AR-LDM
Story ContinuationPororoSVFID17.4AR-LDM
Story ContinuationFlintstonesSVFID19.28AR-LDM
Story ContinuationVISTFID16.95AR-LDM (SIS captions)
Story ContinuationVISTFID17.03AR-LDM (DII captions)

Related Papers

Consistent Story Generation with Asymmetry Zigzag Sampling2025-06-11ViStoryBench: Comprehensive Benchmark Suite for Story Visualization2025-05-30Object Isolated Attention for Consistent Story Visualization2025-03-30Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling2024-12-30DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation2024-12-10StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization2024-12-10Story-Adapter: A Training-free Iterative Framework for Long Story Visualization2024-10-08DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion2024-07-17