Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Zhi Li, Lu He, Huijuan Xu

2022-07-24Action Detection Weakly Supervised Action Localization Fine-Grained Action Detection Action Understanding

Abstract

Action understanding has evolved into the era of fine granularity, as most human behaviors in real life have only minor differences. To detect these fine-grained actions accurately in a label-efficient way, we tackle the problem of weakly-supervised fine-grained temporal action detection in videos for the first time. Without the careful design to capture subtle differences between fine-grained actions, previous weakly-supervised models for general action detection cannot perform well in the fine-grained setting. We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data through self-supervised clustering, in order to capture the commonality and individuality of fine-grained actions. The learnt atomic actions, represented by visual concepts, are further mapped to fine and coarse action labels leveraging the semantic label hierarchy. Our approach constructs a visual representation hierarchy of four levels: clip level, atomic action level, fine action class level and coarse action class level, with supervision at each level. Extensive experiments on two large-scale fine-grained video datasets, FineAction and FineGym, show the benefit of our proposed weakly-supervised model for fine-grained action detection, and it achieves state-of-the-art results.

Results

Task	Dataset	Metric	Value	Model
Video	FineAction	mAP	4.1	HAAN
Video	FineAction	mAP IOU@0.5	7.05	HAAN
Video	FineAction	mAP IOU@0.75	3.95	HAAN
Video	FineAction	mAP IOU@0.95	1.14	HAAN
Temporal Action Localization	FineAction	mAP	4.1	HAAN
Temporal Action Localization	FineAction	mAP IOU@0.5	7.05	HAAN
Temporal Action Localization	FineAction	mAP IOU@0.75	3.95	HAAN
Temporal Action Localization	FineAction	mAP IOU@0.95	1.14	HAAN
Zero-Shot Learning	FineAction	mAP	4.1	HAAN
Zero-Shot Learning	FineAction	mAP IOU@0.5	7.05	HAAN
Zero-Shot Learning	FineAction	mAP IOU@0.75	3.95	HAAN
Zero-Shot Learning	FineAction	mAP IOU@0.95	1.14	HAAN
Action Localization	FineAction	mAP	4.1	HAAN
Action Localization	FineAction	mAP IOU@0.5	7.05	HAAN
Action Localization	FineAction	mAP IOU@0.75	3.95	HAAN
Action Localization	FineAction	mAP IOU@0.95	1.14	HAAN
Weakly Supervised Action Localization	FineAction	mAP	4.1	HAAN
Weakly Supervised Action Localization	FineAction	mAP IOU@0.5	7.05	HAAN
Weakly Supervised Action Localization	FineAction	mAP IOU@0.75	3.95	HAAN
Weakly Supervised Action Localization	FineAction	mAP IOU@0.95	1.14	HAAN

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Abstract

Results

Related Papers

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Abstract

Results

Related Papers