TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GLAC Net: GLocal Attention Cascading Networks for Multi-im...

GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation

Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung-Wha Park, Byoung-Tak Zhang

2018-05-28Story GenerationVisual Storytelling
PaperPDFCode(official)

Abstract

The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images. The main difficulty is how to generate image-specific sentences within the context of overall images. Here we propose a deep learning network model, GLAC Net, that generates visual stories by combining global-local (glocal) attention and context cascading mechanisms. The model incorporates two levels of attention, i.e., overall encoding level and image feature level, to construct image-dependent sentences. While standard attention configuration needs a large number of parameters, the GLAC Net implements them in a very simple way via hard connections from the outputs of encoders or image features onto the sentence generators. The coherency of the generated story is further improved by conveying (cascading) the information of the previous sentence to the next sentence serially. We evaluate the performance of the GLAC Net on the visual storytelling dataset (VIST) and achieve very competitive results compared to the state-of-the-art techniques. Our code and pre-trained models are available here.

Results

TaskDatasetMetricValueModel
Text GenerationVISTMETEOR30.14GLAC Net
Data-to-Text GenerationVISTMETEOR30.14GLAC Net
Visual StorytellingVISTMETEOR30.14GLAC Net
Story GenerationVISTMETEOR30.14GLAC Net

Related Papers

Compressed and Smooth Latent Space for Text Diffusion Modeling2025-06-26Shape2Animal: Creative Animal Generation from Natural Silhouettes2025-06-25JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent2025-06-21StoryWriter: A Multi-Agent Framework for Long Story Generation2025-06-19VINCIE: Unlocking In-context Image Editing from Video2025-06-12Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective2025-06-11Consistent Story Generation with Asymmetry Zigzag Sampling2025-06-11Counterfactual reasoning: an analysis of in-context emergence2025-06-05