TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Supervised Video Summarization via Multiple Feature Sets w...

Supervised Video Summarization via Multiple Feature Sets with Parallel Attention

Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth

2021-04-23Multimodal Deep LearningSupervised Video SummarizationVideo Summarization
PaperPDFCodeCodeCode(official)

Abstract

The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for the other dataset.

Results

TaskDatasetMetricValueModel
VideoTvSumF1-score (Canonical)67.5MAVS [DBLP:conf/mm/FengLKZ18]
VideoTvSumF1-score (Canonical)63.9re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18]
VideoTvSumF1-score (Canonical)63.7MC-VSA [DBLP:journals/corr/abs-2006-01410]
VideoTvSumF1-score (Canonical)61.5MSVA
VideoTvSumKendall's Tau0.19MSVA
VideoTvSumSpearman's Rho0.21MSVA
VideoTvSumF1-score (Canonical)61M-AVS [DBLP:journals/corr/abs-1708-09545]
VideoTvSumF1-score (Canonical)59.8VASNet [DBLP:conf/accv/FajtlSAMR18]
VideoSumMeF1-score (Canonical)53.4MSVA
VideoSumMeKendall's Tau0.2MSVA
VideoSumMeSpearman's Rho0.23MSVA
VideoSumMeF1-score (Canonical)51.6MC-VSA [DBLP:journals/corr/abs-2006-01410]
VideoSumMeF1-score (Canonical)48VASNet [DBLP:conf/accv/FajtlSAMR18]
VideoSumMeKendall's Tau0.16VASNet [DBLP:conf/accv/FajtlSAMR18]
VideoSumMeSpearman's Rho0.17VASNet [DBLP:conf/accv/FajtlSAMR18]
VideoSumMeF1-score (Canonical)44.9re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18]
VideoSumMeF1-score (Canonical)44.4M-AVS [DBLP:journals/corr/abs-1708-09545]
VideoSumMeF1-score (Canonical)43.1MAVS [DBLP:conf/mm/FengLKZ18]
Video SummarizationTvSumF1-score (Canonical)67.5MAVS [DBLP:conf/mm/FengLKZ18]
Video SummarizationTvSumF1-score (Canonical)63.9re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18]
Video SummarizationTvSumF1-score (Canonical)63.7MC-VSA [DBLP:journals/corr/abs-2006-01410]
Video SummarizationTvSumF1-score (Canonical)61.5MSVA
Video SummarizationTvSumKendall's Tau0.19MSVA
Video SummarizationTvSumSpearman's Rho0.21MSVA
Video SummarizationTvSumF1-score (Canonical)61M-AVS [DBLP:journals/corr/abs-1708-09545]
Video SummarizationTvSumF1-score (Canonical)59.8VASNet [DBLP:conf/accv/FajtlSAMR18]
Video SummarizationSumMeF1-score (Canonical)53.4MSVA
Video SummarizationSumMeKendall's Tau0.2MSVA
Video SummarizationSumMeSpearman's Rho0.23MSVA
Video SummarizationSumMeF1-score (Canonical)51.6MC-VSA [DBLP:journals/corr/abs-2006-01410]
Video SummarizationSumMeF1-score (Canonical)48VASNet [DBLP:conf/accv/FajtlSAMR18]
Video SummarizationSumMeKendall's Tau0.16VASNet [DBLP:conf/accv/FajtlSAMR18]
Video SummarizationSumMeSpearman's Rho0.17VASNet [DBLP:conf/accv/FajtlSAMR18]
Video SummarizationSumMeF1-score (Canonical)44.9re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18]
Video SummarizationSumMeF1-score (Canonical)44.4M-AVS [DBLP:journals/corr/abs-1708-09545]
Video SummarizationSumMeF1-score (Canonical)43.1MAVS [DBLP:conf/mm/FengLKZ18]

Related Papers

TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness2025-06-25MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment2025-06-12Prompts to Summaries: Zero-Shot Language-Guided Video Summarization2025-06-12Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization2025-06-10Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems2025-06-05TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations2025-06-03Unsupervised Transcript-assisted Video Summarization and Highlight Detection2025-05-29REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing2025-05-24