Action Unit Memory Network for Weakly Supervised Temporal Action Localization

Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang

2021-04-29CVPR 2021 1Weakly Supervised Action Localization Action Localization Weakly-supervised Temporal Action Localization Temporal Action Localization

Paper PDF

Abstract

Weakly supervised temporal action localization aims to detect and localize actions in untrimmed videos with only video-level labels during training. However, without frame-level annotations, it is challenging to achieve localization completeness and relieve background interference. In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank. In the proposed AUMN, two attention modules are designed to update the memory bank adaptively and learn action units specific classifiers. Furthermore, three effective mechanisms (diversity, homogeneity and sparsity) are designed to guide the updating of the memory network. To the best of our knowledge, this is the first work to explicitly model the action units with a memory network. Extensive experimental results on two standard benchmarks (THUMOS14 and ActivityNet) demonstrate that our AUMN performs favorably against state-of-the-art methods. Specifically, the average mAP of IoU thresholds from 0.1 to 0.5 on the THUMOS14 dataset is significantly improved from 47.0% to 52.1%.

Results

Task	Dataset	Metric	Value	Model
Video	THUMOS14	avg-mAP (0.1-0.5)	52.1	AUMN
Video	THUMOS14	avg-mAP (0.1:0.7)	41.5	AUMN
Video	THUMOS14	avg-mAP (0.3-0.7)	32.4	AUMN
Temporal Action Localization	THUMOS14	avg-mAP (0.1-0.5)	52.1	AUMN
Temporal Action Localization	THUMOS14	avg-mAP (0.1:0.7)	41.5	AUMN
Temporal Action Localization	THUMOS14	avg-mAP (0.3-0.7)	32.4	AUMN
Zero-Shot Learning	THUMOS14	avg-mAP (0.1-0.5)	52.1	AUMN
Zero-Shot Learning	THUMOS14	avg-mAP (0.1:0.7)	41.5	AUMN
Zero-Shot Learning	THUMOS14	avg-mAP (0.3-0.7)	32.4	AUMN
Action Localization	THUMOS14	avg-mAP (0.1-0.5)	52.1	AUMN
Action Localization	THUMOS14	avg-mAP (0.1:0.7)	41.5	AUMN
Action Localization	THUMOS14	avg-mAP (0.3-0.7)	32.4	AUMN
Weakly Supervised Action Localization	THUMOS14	avg-mAP (0.1-0.5)	52.1	AUMN
Weakly Supervised Action Localization	THUMOS14	avg-mAP (0.1:0.7)	41.5	AUMN
Weakly Supervised Action Localization	THUMOS14	avg-mAP (0.3-0.7)	32.4	AUMN

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

Abstract

Results

Related Papers

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

Abstract

Results

Related Papers