TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Summarizing Videos with Attention

Summarizing Videos with Attention

Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino

2018-12-05Video Summarization
PaperPDFCodeCodeCode(official)CodeCode

Abstract

In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.

Results

TaskDatasetMetricValueModel
VideoTvSumF1-score (Augmented)62.37VASNet
VideoTvSumF1-score (Canonical)61.42VASNet
VideoSumMeF1-score (Augmented)51.09VASNet
VideoSumMeF1-score (Canonical)49.71VASNet
Video SummarizationTvSumF1-score (Augmented)62.37VASNet
Video SummarizationTvSumF1-score (Canonical)61.42VASNet
Video SummarizationSumMeF1-score (Augmented)51.09VASNet
Video SummarizationSumMeF1-score (Canonical)49.71VASNet

Related Papers

TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness2025-06-25MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment2025-06-12Prompts to Summaries: Zero-Shot Language-Guided Video Summarization2025-06-12Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization2025-06-10TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations2025-06-03Unsupervised Transcript-assisted Video Summarization and Highlight Detection2025-05-29REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing2025-05-24SD-VSum: A Method and Dataset for Script-Driven Video Summarization2025-05-06