TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Adaptive Mutual Supervision for Weakly-Supervised Temporal...

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Xiaoyun Zhang, Qi Tian

2021-04-06Weakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal Action Localization
PaperPDF

Abstract

Weakly-supervised temporal action localization aims to localize actions in untrimmed videos with only video-level action category labels. Most of previous methods ignore the incompleteness issue of Class Activation Sequences (CAS), suffering from trivial localization results. To solve this issue, we introduce an adaptive mutual supervision framework (AMS) with two branches, where the base branch adopts CAS to localize the most discriminative action regions, while the supplementary branch localizes the less discriminative action regions through a novel adaptive sampler. The adaptive sampler dynamically updates the input of the supplementary branch with a sampling weight sequence negatively correlated with the CAS from the base branch, thereby prompting the supplementary branch to localize the action regions underestimated by the base branch. To promote mutual enhancement between these two branches, we construct mutual location supervision. Each branch leverages location pseudo-labels generated from the other branch as localization supervision. By alternately optimizing the two branches in multiple iterations, we progressively complete action regions. Extensive experiments on THUMOS14 and ActivityNet1.2 demonstrate that the proposed AMS method significantly outperforms the state-of-the-art methods.

Results

TaskDatasetMetricValueModel
VideoTHUMOS14avg-mAP (0.1-0.5)52AMS
VideoTHUMOS14avg-mAP (0.1:0.7)42.3AMS
VideoTHUMOS14avg-mAP (0.3-0.7)32.4AMS
Temporal Action LocalizationTHUMOS14avg-mAP (0.1-0.5)52AMS
Temporal Action LocalizationTHUMOS14avg-mAP (0.1:0.7)42.3AMS
Temporal Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.4AMS
Zero-Shot LearningTHUMOS14avg-mAP (0.1-0.5)52AMS
Zero-Shot LearningTHUMOS14avg-mAP (0.1:0.7)42.3AMS
Zero-Shot LearningTHUMOS14avg-mAP (0.3-0.7)32.4AMS
Action LocalizationTHUMOS14avg-mAP (0.1-0.5)52AMS
Action LocalizationTHUMOS14avg-mAP (0.1:0.7)42.3AMS
Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.4AMS
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1-0.5)52AMS
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1:0.7)42.3AMS
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.4AMS

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization2025-05-29DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition2025-05-27ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization2025-05-23