TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Action Unit Memory Network for Weakly Supervised Temporal ...

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang

2021-04-29CVPR 2021 1Weakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal Action Localization
PaperPDF

Abstract

Weakly supervised temporal action localization aims to detect and localize actions in untrimmed videos with only video-level labels during training. However, without frame-level annotations, it is challenging to achieve localization completeness and relieve background interference. In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank. In the proposed AUMN, two attention modules are designed to update the memory bank adaptively and learn action units specific classifiers. Furthermore, three effective mechanisms (diversity, homogeneity and sparsity) are designed to guide the updating of the memory network. To the best of our knowledge, this is the first work to explicitly model the action units with a memory network. Extensive experimental results on two standard benchmarks (THUMOS14 and ActivityNet) demonstrate that our AUMN performs favorably against state-of-the-art methods. Specifically, the average mAP of IoU thresholds from 0.1 to 0.5 on the THUMOS14 dataset is significantly improved from 47.0% to 52.1%.

Results

TaskDatasetMetricValueModel
VideoTHUMOS14avg-mAP (0.1-0.5)52.1AUMN
VideoTHUMOS14avg-mAP (0.1:0.7)41.5AUMN
VideoTHUMOS14avg-mAP (0.3-0.7)32.4AUMN
Temporal Action LocalizationTHUMOS14avg-mAP (0.1-0.5)52.1AUMN
Temporal Action LocalizationTHUMOS14avg-mAP (0.1:0.7)41.5AUMN
Temporal Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.4AUMN
Zero-Shot LearningTHUMOS14avg-mAP (0.1-0.5)52.1AUMN
Zero-Shot LearningTHUMOS14avg-mAP (0.1:0.7)41.5AUMN
Zero-Shot LearningTHUMOS14avg-mAP (0.3-0.7)32.4AUMN
Action LocalizationTHUMOS14avg-mAP (0.1-0.5)52.1AUMN
Action LocalizationTHUMOS14avg-mAP (0.1:0.7)41.5AUMN
Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.4AUMN
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1-0.5)52.1AUMN
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1:0.7)41.5AUMN
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.4AUMN

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization2025-05-29DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition2025-05-27ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization2025-05-23