TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchically-Attentive RNN for Album Summarization and S...

Hierarchically-Attentive RNN for Album Summarization and Storytelling

Licheng Yu, Mohit Bansal, Tamara L. Berg

2017-08-09EMNLP 2017 9RetrievalVisual Storytelling
PaperPDF

Abstract

We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.

Results

TaskDatasetMetricValueModel
Text GenerationVISTBLEU-320.78h-attn-rank
Text GenerationVISTCIDEr7.38h-attn-rank
Text GenerationVISTMETEOR33.94h-attn-rank
Text GenerationVISTROUGE-L29.82h-attn-rank
Data-to-Text GenerationVISTBLEU-320.78h-attn-rank
Data-to-Text GenerationVISTCIDEr7.38h-attn-rank
Data-to-Text GenerationVISTMETEOR33.94h-attn-rank
Data-to-Text GenerationVISTROUGE-L29.82h-attn-rank
Visual StorytellingVISTBLEU-320.78h-attn-rank
Visual StorytellingVISTCIDEr7.38h-attn-rank
Visual StorytellingVISTMETEOR33.94h-attn-rank
Visual StorytellingVISTROUGE-L29.82h-attn-rank
Story GenerationVISTBLEU-320.78h-attn-rank
Story GenerationVISTCIDEr7.38h-attn-rank
Story GenerationVISTMETEOR33.94h-attn-rank
Story GenerationVISTROUGE-L29.82h-attn-rank

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15