TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Weakly Supervised Action Localization by Sparse Temporal P...

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han

2017-12-14CVPR 2018 6Weakly Supervised Action LocalizationAction ClassificationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal LocalizationTemporal Action Localization
PaperPDFCodeCodeCode

Abstract

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.1:0.727STPN
VideoTHUMOS 2014mAP@0.516.9STPN
VideoActivityNet-1.3mAP@0.529.3STPN
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.727STPN
Temporal Action LocalizationTHUMOS 2014mAP@0.516.9STPN
Temporal Action LocalizationActivityNet-1.3mAP@0.529.3STPN
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.727STPN
Zero-Shot LearningTHUMOS 2014mAP@0.516.9STPN
Zero-Shot LearningActivityNet-1.3mAP@0.529.3STPN
Action LocalizationTHUMOS 2014mAP@0.1:0.727STPN
Action LocalizationTHUMOS 2014mAP@0.516.9STPN
Action LocalizationActivityNet-1.3mAP@0.529.3STPN
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.727STPN
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.516.9STPN
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.529.3STPN

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements2025-06-11SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos2025-06-05VideoMolmo: Spatio-Temporal Grounding Meets Pointing2025-06-05Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01