TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Video Summarization with Attention-Based Encoder-Decoder N...

Video Summarization with Attention-Based Encoder-Decoder Networks

Zhong Ji, Kailin Xiong, Yanwei Pang, Xuelong. Li

2017-08-31Supervised Video SummarizationVideo Summarization
PaperPDF

Abstract

This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named Attentive encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on three video summarization benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the superiority of the proposed AVS-based approaches against the state-of-the-art approaches,with remarkable improvements from 0.8% to 3% on two datasets,respectively..

Results

TaskDatasetMetricValueModel
VideoTvSumF1-score (Augmented)61.8M-AVS
VideoTvSumF1-score (Canonical)61M-AVS
VideoSumMeF1-score (Augmented)46.1M-AVS
VideoSumMeF1-score (Canonical)44.4M-AVS
Video SummarizationTvSumF1-score (Augmented)61.8M-AVS
Video SummarizationTvSumF1-score (Canonical)61M-AVS
Video SummarizationSumMeF1-score (Augmented)46.1M-AVS
Video SummarizationSumMeF1-score (Canonical)44.4M-AVS

Related Papers

TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness2025-06-25MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment2025-06-12Prompts to Summaries: Zero-Shot Language-Guided Video Summarization2025-06-12Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization2025-06-10TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations2025-06-03Unsupervised Transcript-assisted Video Summarization and Highlight Detection2025-05-29REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing2025-05-24SD-VSum: A Method and Dataset for Script-Driven Video Summarization2025-05-06