TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Action Completeness from Points for Weakly-superv...

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

Pilhyeon Lee, Hyeran Byun

2021-08-11ICCV 2021 10Weakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal Action Localization
PaperPDFCode(official)

Abstract

We tackle the problem of localizing temporal intervals of actions with only a single frame label for each action instance for training. Owing to label sparsity, existing work fails to learn action completeness, resulting in fragmentary action predictions. In this paper, we propose a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model. Concretely, we first select pseudo background points to supplement point-level action labels. Then, by taking the points as seeds, we search for the optimal sequence that is likely to contain complete action instances while agreeing with the seeds. To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively. Experimental results demonstrate that our completeness guidance indeed helps the model to locate complete action instances, leading to large performance gains especially under high IoU thresholds. Moreover, we demonstrate the superiority of our method over existing state-of-the-art methods on four benchmarks: THUMOS'14, GTEA, BEOID, and ActivityNet. Notably, our method even performs comparably to recent fully-supervised methods, at the 6 times cheaper annotation cost. Our code is available at https://github.com/Pilhyeon.

Results

TaskDatasetMetricValueModel
VideoGTEAmAP@0.1:0.743.5LACP
VideoGTEAmAP@0.533.9LACP
VideoBEOIDmAP@0.1:0.751.8LACP
VideoBEOIDmAP@0.542.7LACP
VideoTHUMOS 2014mAP@0.1:0.562.7LACP
VideoTHUMOS 2014mAP@0.1:0.752.8LACP
VideoTHUMOS 2014mAP@0.545.3LACP
VideoTHUMOS14avg-mAP (0.1-0.5)62.7LACP
VideoTHUMOS14avg-mAP (0.1:0.7)52.8LACP
VideoTHUMOS14avg-mAP (0.3-0.7)44.5LACP
VideoTHUMOS’14mAP@0.545.3LACP
VideoActivityNet-1.3mAP@0.540.4LACP
VideoActivityNet-1.3mAP@0.5:0.9525.1LACP
VideoActivityNet-1.2Mean mAP26.8LACP
VideoActivityNet-1.2mAP@0.544LACP
Temporal Action LocalizationGTEAmAP@0.1:0.743.5LACP
Temporal Action LocalizationGTEAmAP@0.533.9LACP
Temporal Action LocalizationBEOIDmAP@0.1:0.751.8LACP
Temporal Action LocalizationBEOIDmAP@0.542.7LACP
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.562.7LACP
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.752.8LACP
Temporal Action LocalizationTHUMOS 2014mAP@0.545.3LACP
Temporal Action LocalizationTHUMOS14avg-mAP (0.1-0.5)62.7LACP
Temporal Action LocalizationTHUMOS14avg-mAP (0.1:0.7)52.8LACP
Temporal Action LocalizationTHUMOS14avg-mAP (0.3-0.7)44.5LACP
Temporal Action LocalizationTHUMOS’14mAP@0.545.3LACP
Temporal Action LocalizationActivityNet-1.3mAP@0.540.4LACP
Temporal Action LocalizationActivityNet-1.3mAP@0.5:0.9525.1LACP
Temporal Action LocalizationActivityNet-1.2Mean mAP26.8LACP
Temporal Action LocalizationActivityNet-1.2mAP@0.544LACP
Zero-Shot LearningGTEAmAP@0.1:0.743.5LACP
Zero-Shot LearningGTEAmAP@0.533.9LACP
Zero-Shot LearningBEOIDmAP@0.1:0.751.8LACP
Zero-Shot LearningBEOIDmAP@0.542.7LACP
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.562.7LACP
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.752.8LACP
Zero-Shot LearningTHUMOS 2014mAP@0.545.3LACP
Zero-Shot LearningTHUMOS14avg-mAP (0.1-0.5)62.7LACP
Zero-Shot LearningTHUMOS14avg-mAP (0.1:0.7)52.8LACP
Zero-Shot LearningTHUMOS14avg-mAP (0.3-0.7)44.5LACP
Zero-Shot LearningTHUMOS’14mAP@0.545.3LACP
Zero-Shot LearningActivityNet-1.3mAP@0.540.4LACP
Zero-Shot LearningActivityNet-1.3mAP@0.5:0.9525.1LACP
Zero-Shot LearningActivityNet-1.2Mean mAP26.8LACP
Zero-Shot LearningActivityNet-1.2mAP@0.544LACP
Action LocalizationGTEAmAP@0.1:0.743.5LACP
Action LocalizationGTEAmAP@0.533.9LACP
Action LocalizationBEOIDmAP@0.1:0.751.8LACP
Action LocalizationBEOIDmAP@0.542.7LACP
Action LocalizationTHUMOS 2014mAP@0.1:0.562.7LACP
Action LocalizationTHUMOS 2014mAP@0.1:0.752.8LACP
Action LocalizationTHUMOS 2014mAP@0.545.3LACP
Action LocalizationTHUMOS14avg-mAP (0.1-0.5)62.7LACP
Action LocalizationTHUMOS14avg-mAP (0.1:0.7)52.8LACP
Action LocalizationTHUMOS14avg-mAP (0.3-0.7)44.5LACP
Action LocalizationTHUMOS’14mAP@0.545.3LACP
Action LocalizationActivityNet-1.3mAP@0.540.4LACP
Action LocalizationActivityNet-1.3mAP@0.5:0.9525.1LACP
Action LocalizationActivityNet-1.2Mean mAP26.8LACP
Action LocalizationActivityNet-1.2mAP@0.544LACP
Weakly Supervised Action LocalizationGTEAmAP@0.1:0.743.5LACP
Weakly Supervised Action LocalizationGTEAmAP@0.533.9LACP
Weakly Supervised Action LocalizationBEOIDmAP@0.1:0.751.8LACP
Weakly Supervised Action LocalizationBEOIDmAP@0.542.7LACP
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.562.7LACP
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.752.8LACP
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.545.3LACP
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1-0.5)62.7LACP
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1:0.7)52.8LACP
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.3-0.7)44.5LACP
Weakly Supervised Action LocalizationTHUMOS’14mAP@0.545.3LACP
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.540.4LACP
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.5:0.9525.1LACP
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP26.8LACP
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.544LACP

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization2025-05-29DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition2025-05-27ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization2025-05-23