TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Weakly-Supervised Temporal Action Detection for Fine-Grain...

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Zhi Li, Lu He, Huijuan Xu

2022-07-24Action DetectionWeakly Supervised Action LocalizationFine-Grained Action DetectionAction Understanding
PaperPDFCode(official)

Abstract

Action understanding has evolved into the era of fine granularity, as most human behaviors in real life have only minor differences. To detect these fine-grained actions accurately in a label-efficient way, we tackle the problem of weakly-supervised fine-grained temporal action detection in videos for the first time. Without the careful design to capture subtle differences between fine-grained actions, previous weakly-supervised models for general action detection cannot perform well in the fine-grained setting. We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data through self-supervised clustering, in order to capture the commonality and individuality of fine-grained actions. The learnt atomic actions, represented by visual concepts, are further mapped to fine and coarse action labels leveraging the semantic label hierarchy. Our approach constructs a visual representation hierarchy of four levels: clip level, atomic action level, fine action class level and coarse action class level, with supervision at each level. Extensive experiments on two large-scale fine-grained video datasets, FineAction and FineGym, show the benefit of our proposed weakly-supervised model for fine-grained action detection, and it achieves state-of-the-art results.

Results

TaskDatasetMetricValueModel
VideoFineActionmAP4.1HAAN
VideoFineActionmAP IOU@0.57.05HAAN
VideoFineActionmAP IOU@0.753.95HAAN
VideoFineActionmAP IOU@0.951.14HAAN
Temporal Action LocalizationFineActionmAP4.1HAAN
Temporal Action LocalizationFineActionmAP IOU@0.57.05HAAN
Temporal Action LocalizationFineActionmAP IOU@0.753.95HAAN
Temporal Action LocalizationFineActionmAP IOU@0.951.14HAAN
Zero-Shot LearningFineActionmAP4.1HAAN
Zero-Shot LearningFineActionmAP IOU@0.57.05HAAN
Zero-Shot LearningFineActionmAP IOU@0.753.95HAAN
Zero-Shot LearningFineActionmAP IOU@0.951.14HAAN
Action LocalizationFineActionmAP4.1HAAN
Action LocalizationFineActionmAP IOU@0.57.05HAAN
Action LocalizationFineActionmAP IOU@0.753.95HAAN
Action LocalizationFineActionmAP IOU@0.951.14HAAN
Weakly Supervised Action LocalizationFineActionmAP4.1HAAN
Weakly Supervised Action LocalizationFineActionmAP IOU@0.57.05HAAN
Weakly Supervised Action LocalizationFineActionmAP IOU@0.753.95HAAN
Weakly Supervised Action LocalizationFineActionmAP IOU@0.951.14HAAN

Related Papers

LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors2025-05-31Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM2025-05-29