TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Weakly-supervised Temporal Action Localization by Uncertai...

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun

2020-06-12Weakly Supervised Action LocalizationAction ClassificationAction LocalizationMultiple Instance LearningWeakly-supervised Temporal Action LocalizationOut-of-Distribution DetectionTemporal Action Localization
PaperPDFCode(official)Code

Abstract

Weakly-supervised temporal action localization aims to learn detecting temporal intervals of action classes with only video-level labels. To this end, it is crucial to separate frames of action classes from the background frames (i.e., frames not belonging to any action classes). In this paper, we present a new perspective on background frames where they are modeled as out-of-distribution samples regarding their inconsistency. Then, background frames can be detected by estimating the probability of each frame being out-of-distribution, known as uncertainty, but it is infeasible to directly learn uncertainty without frame-level labels. To realize the uncertainty learning in the weakly-supervised setting, we leverage the multiple instance learning formulation. Moreover, we further introduce a background entropy loss to better discriminate background frames by encouraging their in-distribution (action) probabilities to be uniformly distributed over all action classes. Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles. We demonstrate that our model significantly outperforms state-of-the-art methods on the benchmarks, THUMOS'14 and ActivityNet (1.2 & 1.3). Our code is available at https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling.

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.1:0.551.6Lee et al.
VideoTHUMOS 2014mAP@0.1:0.741.9Lee et al.
VideoTHUMOS 2014mAP@0.533.7Lee et al.
VideoTHUMOS14avg-mAP (0.1-0.5)51.6Lee et al.
VideoTHUMOS14avg-mAP (0.1:0.7)41.9Lee et al.
VideoTHUMOS14avg-mAP (0.3-0.7)32.9Lee et al.
VideoTHUMOS’14mAP@0.533.7Lee et al.
VideoActivityNet-1.3mAP@0.537Lee et al.
VideoActivityNet-1.3mAP@0.5:0.9523.7Lee et al.
VideoActivityNet-1.2Mean mAP25.9Lee et al.
VideoActivityNet-1.2mAP@0.541.2Lee et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.551.6Lee et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.741.9Lee et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.533.7Lee et al.
Temporal Action LocalizationTHUMOS14avg-mAP (0.1-0.5)51.6Lee et al.
Temporal Action LocalizationTHUMOS14avg-mAP (0.1:0.7)41.9Lee et al.
Temporal Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.9Lee et al.
Temporal Action LocalizationTHUMOS’14mAP@0.533.7Lee et al.
Temporal Action LocalizationActivityNet-1.3mAP@0.537Lee et al.
Temporal Action LocalizationActivityNet-1.3mAP@0.5:0.9523.7Lee et al.
Temporal Action LocalizationActivityNet-1.2Mean mAP25.9Lee et al.
Temporal Action LocalizationActivityNet-1.2mAP@0.541.2Lee et al.
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.551.6Lee et al.
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.741.9Lee et al.
Zero-Shot LearningTHUMOS 2014mAP@0.533.7Lee et al.
Zero-Shot LearningTHUMOS14avg-mAP (0.1-0.5)51.6Lee et al.
Zero-Shot LearningTHUMOS14avg-mAP (0.1:0.7)41.9Lee et al.
Zero-Shot LearningTHUMOS14avg-mAP (0.3-0.7)32.9Lee et al.
Zero-Shot LearningTHUMOS’14mAP@0.533.7Lee et al.
Zero-Shot LearningActivityNet-1.3mAP@0.537Lee et al.
Zero-Shot LearningActivityNet-1.3mAP@0.5:0.9523.7Lee et al.
Zero-Shot LearningActivityNet-1.2Mean mAP25.9Lee et al.
Zero-Shot LearningActivityNet-1.2mAP@0.541.2Lee et al.
Action LocalizationTHUMOS 2014mAP@0.1:0.551.6Lee et al.
Action LocalizationTHUMOS 2014mAP@0.1:0.741.9Lee et al.
Action LocalizationTHUMOS 2014mAP@0.533.7Lee et al.
Action LocalizationTHUMOS14avg-mAP (0.1-0.5)51.6Lee et al.
Action LocalizationTHUMOS14avg-mAP (0.1:0.7)41.9Lee et al.
Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.9Lee et al.
Action LocalizationTHUMOS’14mAP@0.533.7Lee et al.
Action LocalizationActivityNet-1.3mAP@0.537Lee et al.
Action LocalizationActivityNet-1.3mAP@0.5:0.9523.7Lee et al.
Action LocalizationActivityNet-1.2Mean mAP25.9Lee et al.
Action LocalizationActivityNet-1.2mAP@0.541.2Lee et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.551.6Lee et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.741.9Lee et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.533.7Lee et al.
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1-0.5)51.6Lee et al.
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1:0.7)41.9Lee et al.
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.9Lee et al.
Weakly Supervised Action LocalizationTHUMOS’14mAP@0.533.7Lee et al.
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.537Lee et al.
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.5:0.9523.7Lee et al.
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP25.9Lee et al.
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.541.2Lee et al.

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning2025-07-09Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation2025-07-08FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection2025-07-06Out-of-distribution detection in 3D applications: a review2025-07-01The Trilemma of Truth in Large Language Models2025-06-30Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks2025-06-26OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport2025-06-25