TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Keep it Consistent: Topic-Aware Storytelling from an Image...

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Ruize Wang, Zhongyu Wei, Ying Cheng, Piji Li, Haijun Shan, Ji Zhang, Qi Zhang, Xuanjing Huang

2019-11-11COLING 2020 8Image CaptioningQuestion GenerationVisual Storytelling
PaperPDF

Abstract

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good ability in generating stories with higher quality compared to state-of-the-art methods.

Results

TaskDatasetMetricValueModel
Text GenerationVISTBLEU-164.2TAVST (RL)
Text GenerationVISTBLEU-239.6TAVST (RL)
Text GenerationVISTBLEU-323.7TAVST (RL)
Text GenerationVISTBLEU-414.6TAVST (RL)
Text GenerationVISTCIDEr9.2TAVST (RL)
Text GenerationVISTMETEOR35.7TAVST (RL)
Text GenerationVISTROUGE-L31TAVST (RL)
Data-to-Text GenerationVISTBLEU-164.2TAVST (RL)
Data-to-Text GenerationVISTBLEU-239.6TAVST (RL)
Data-to-Text GenerationVISTBLEU-323.7TAVST (RL)
Data-to-Text GenerationVISTBLEU-414.6TAVST (RL)
Data-to-Text GenerationVISTCIDEr9.2TAVST (RL)
Data-to-Text GenerationVISTMETEOR35.7TAVST (RL)
Data-to-Text GenerationVISTROUGE-L31TAVST (RL)
Visual StorytellingVISTBLEU-164.2TAVST (RL)
Visual StorytellingVISTBLEU-239.6TAVST (RL)
Visual StorytellingVISTBLEU-323.7TAVST (RL)
Visual StorytellingVISTBLEU-414.6TAVST (RL)
Visual StorytellingVISTCIDEr9.2TAVST (RL)
Visual StorytellingVISTMETEOR35.7TAVST (RL)
Visual StorytellingVISTROUGE-L31TAVST (RL)
Story GenerationVISTBLEU-164.2TAVST (RL)
Story GenerationVISTBLEU-239.6TAVST (RL)
Story GenerationVISTBLEU-323.7TAVST (RL)
Story GenerationVISTBLEU-414.6TAVST (RL)
Story GenerationVISTCIDEr9.2TAVST (RL)
Story GenerationVISTMETEOR35.7TAVST (RL)
Story GenerationVISTROUGE-L31TAVST (RL)

Related Papers

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval2025-06-28Compressed and Smooth Latent Space for Text Diffusion Modeling2025-06-26Shape2Animal: Creative Animal Generation from Natural Silhouettes2025-06-25JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent2025-06-21ELLIS Alicante at CQs-Gen 2025: Winning the critical thinking questions shared task: LLM-based question generation and selection2025-06-17HalLoc: Token-level Localization of Hallucinations for Vision Language Models2025-06-12ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs2025-06-11