TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Visual Storytelling via Predicting Anchor Word Embeddings ...

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

Bowen Zhang, Hexiang Hu, Fei Sha

2020-01-13Word EmbeddingsVisual Storytelling
PaperPDF

Abstract

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings of randomly sampled nouns from the groundtruth stories as the target anchor word embeddings to learn the predictor. To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model. As opposed to state-of-the-art methods, the proposed model is simple in design, easy to optimize, and attains the best results in most automatic evaluation metrics. In human evaluation, the method also outperforms competing methods.

Results

TaskDatasetMetricValueModel
Text GenerationVISTBLEU-165.1StoryAnchor: w/ Predicted Nouns
Text GenerationVISTBLEU-240StoryAnchor: w/ Predicted Nouns
Text GenerationVISTBLEU-323.4StoryAnchor: w/ Predicted Nouns
Text GenerationVISTBLEU-414StoryAnchor: w/ Predicted Nouns
Text GenerationVISTCIDEr9.9StoryAnchor: w/ Predicted Nouns
Text GenerationVISTMETEOR35.5StoryAnchor: w/ Predicted Nouns
Text GenerationVISTROUGE-L30StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTBLEU-165.1StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTBLEU-240StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTBLEU-323.4StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTBLEU-414StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTCIDEr9.9StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTMETEOR35.5StoryAnchor: w/ Predicted Nouns
Data-to-Text GenerationVISTROUGE-L30StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTBLEU-165.1StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTBLEU-240StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTBLEU-323.4StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTBLEU-414StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTCIDEr9.9StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTMETEOR35.5StoryAnchor: w/ Predicted Nouns
Visual StorytellingVISTROUGE-L30StoryAnchor: w/ Predicted Nouns
Story GenerationVISTBLEU-165.1StoryAnchor: w/ Predicted Nouns
Story GenerationVISTBLEU-240StoryAnchor: w/ Predicted Nouns
Story GenerationVISTBLEU-323.4StoryAnchor: w/ Predicted Nouns
Story GenerationVISTBLEU-414StoryAnchor: w/ Predicted Nouns
Story GenerationVISTCIDEr9.9StoryAnchor: w/ Predicted Nouns
Story GenerationVISTMETEOR35.5StoryAnchor: w/ Predicted Nouns
Story GenerationVISTROUGE-L30StoryAnchor: w/ Predicted Nouns

Related Papers

Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Computational Detection of Intertextual Parallels in Biblical Hebrew: A Benchmark Study Using Transformer-Based Language Models2025-06-30Shape2Animal: Creative Animal Generation from Natural Silhouettes2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Low-resource keyword spotting using contrastively trained transformer acoustic word embeddings2025-06-21JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent2025-06-21Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings2025-06-16Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform2025-06-11