TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SF-Net: Single-Frame Supervision for Temporal Action Local...

SF-Net: Single-Frame Supervision for Temporal Action Localization

Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou

2020-03-15ECCV 2020 8Weakly Supervised Action LocalizationAction LocalizationTemporal Action Localization
PaperPDFCode(official)

Abstract

In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL). To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action. This can significantly reduce the labor cost of obtaining full supervision which requires annotating the action boundary. Compared to the weak supervision that only annotates the video-level label, the single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead. To make full use of such single-frame supervision, we propose a unified system called SF-Net. First, we propose to predict an actionness score for each video frame. Along with a typical category score, the actionness score can provide comprehensive information about the occurrence of a potential action and aid the temporal boundary refinement during inference. Second, we mine pseudo action and background frames based on the single-frame annotations. We identify pseudo action frames by adaptively expanding each annotated single frame to its nearby, contextual frames and we mine pseudo background frames from all the unannotated frames across multiple videos. Together with the ground-truth labeled frames, these pseudo-labeled frames are further used for training the classifier. In extensive experiments on THUMOS14, GTEA, and BEOID, SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization. Notably, SF-Net achieves comparable results to its fully-supervised counterpart which requires much more resource intensive annotations. The code is available at https://github.com/Flowerfan/SF-Net.

Results

TaskDatasetMetricValueModel
VideoGTEAmAP@0.1:0.731SF-Net
VideoGTEAmAP@0.519.3SF-Net
VideoBEOIDmAP@0.1:0.730.1SF-Net
VideoBEOIDmAP@0.516.7SF-Net
VideoTHUMOS 2014mAP@0.1:0.551.2SF-Net
VideoTHUMOS 2014mAP@0.1:0.741.2SF-Net
VideoTHUMOS 2014mAP@0.530.5SF-Net
VideoActivityNet-1.2Mean mAP22.8SF-Net
VideoActivityNet-1.2mAP@0.537.8SF-Net
Temporal Action LocalizationGTEAmAP@0.1:0.731SF-Net
Temporal Action LocalizationGTEAmAP@0.519.3SF-Net
Temporal Action LocalizationBEOIDmAP@0.1:0.730.1SF-Net
Temporal Action LocalizationBEOIDmAP@0.516.7SF-Net
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.551.2SF-Net
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.741.2SF-Net
Temporal Action LocalizationTHUMOS 2014mAP@0.530.5SF-Net
Temporal Action LocalizationActivityNet-1.2Mean mAP22.8SF-Net
Temporal Action LocalizationActivityNet-1.2mAP@0.537.8SF-Net
Zero-Shot LearningGTEAmAP@0.1:0.731SF-Net
Zero-Shot LearningGTEAmAP@0.519.3SF-Net
Zero-Shot LearningBEOIDmAP@0.1:0.730.1SF-Net
Zero-Shot LearningBEOIDmAP@0.516.7SF-Net
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.551.2SF-Net
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.741.2SF-Net
Zero-Shot LearningTHUMOS 2014mAP@0.530.5SF-Net
Zero-Shot LearningActivityNet-1.2Mean mAP22.8SF-Net
Zero-Shot LearningActivityNet-1.2mAP@0.537.8SF-Net
Action LocalizationGTEAmAP@0.1:0.731SF-Net
Action LocalizationGTEAmAP@0.519.3SF-Net
Action LocalizationBEOIDmAP@0.1:0.730.1SF-Net
Action LocalizationBEOIDmAP@0.516.7SF-Net
Action LocalizationTHUMOS 2014mAP@0.1:0.551.2SF-Net
Action LocalizationTHUMOS 2014mAP@0.1:0.741.2SF-Net
Action LocalizationTHUMOS 2014mAP@0.530.5SF-Net
Action LocalizationActivityNet-1.2Mean mAP22.8SF-Net
Action LocalizationActivityNet-1.2mAP@0.537.8SF-Net
Weakly Supervised Action LocalizationGTEAmAP@0.1:0.731SF-Net
Weakly Supervised Action LocalizationGTEAmAP@0.519.3SF-Net
Weakly Supervised Action LocalizationBEOIDmAP@0.1:0.730.1SF-Net
Weakly Supervised Action LocalizationBEOIDmAP@0.516.7SF-Net
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.551.2SF-Net
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.741.2SF-Net
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.530.5SF-Net
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP22.8SF-Net
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.537.8SF-Net

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization2025-05-29DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition2025-05-27ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization2025-05-23