Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Xiaoyun Zhang, Qi Tian

2021-04-06Weakly Supervised Action Localization Action Localization Weakly-supervised Temporal Action Localization Temporal Action Localization

Paper PDF

Abstract

Weakly-supervised temporal action localization aims to localize actions in untrimmed videos with only video-level action category labels. Most of previous methods ignore the incompleteness issue of Class Activation Sequences (CAS), suffering from trivial localization results. To solve this issue, we introduce an adaptive mutual supervision framework (AMS) with two branches, where the base branch adopts CAS to localize the most discriminative action regions, while the supplementary branch localizes the less discriminative action regions through a novel adaptive sampler. The adaptive sampler dynamically updates the input of the supplementary branch with a sampling weight sequence negatively correlated with the CAS from the base branch, thereby prompting the supplementary branch to localize the action regions underestimated by the base branch. To promote mutual enhancement between these two branches, we construct mutual location supervision. Each branch leverages location pseudo-labels generated from the other branch as localization supervision. By alternately optimizing the two branches in multiple iterations, we progressively complete action regions. Extensive experiments on THUMOS14 and ActivityNet1.2 demonstrate that the proposed AMS method significantly outperforms the state-of-the-art methods.

Results

Task	Dataset	Metric	Value	Model
Video	THUMOS14	avg-mAP (0.1-0.5)	52	AMS
Video	THUMOS14	avg-mAP (0.1:0.7)	42.3	AMS
Video	THUMOS14	avg-mAP (0.3-0.7)	32.4	AMS
Temporal Action Localization	THUMOS14	avg-mAP (0.1-0.5)	52	AMS
Temporal Action Localization	THUMOS14	avg-mAP (0.1:0.7)	42.3	AMS
Temporal Action Localization	THUMOS14	avg-mAP (0.3-0.7)	32.4	AMS
Zero-Shot Learning	THUMOS14	avg-mAP (0.1-0.5)	52	AMS
Zero-Shot Learning	THUMOS14	avg-mAP (0.1:0.7)	42.3	AMS
Zero-Shot Learning	THUMOS14	avg-mAP (0.3-0.7)	32.4	AMS
Action Localization	THUMOS14	avg-mAP (0.1-0.5)	52	AMS
Action Localization	THUMOS14	avg-mAP (0.1:0.7)	42.3	AMS
Action Localization	THUMOS14	avg-mAP (0.3-0.7)	32.4	AMS
Weakly Supervised Action Localization	THUMOS14	avg-mAP (0.1-0.5)	52	AMS
Weakly Supervised Action Localization	THUMOS14	avg-mAP (0.1:0.7)	42.3	AMS
Weakly Supervised Action Localization	THUMOS14	avg-mAP (0.3-0.7)	32.4	AMS

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

Abstract

Results

Related Papers

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

Abstract

Results

Related Papers