TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Modeling Multi-Label Action Dependencies for Temporal Acti...

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah

2021-03-04CVPR 2021 1Action DetectionAction LocalizationTemporal Action LocalizationMulti-Label Classification
PaperPDFCode(official)

Abstract

Real-world videos contain many complex actions with inherent relationships between action classes. In this work, we propose an attention-based architecture that models these action relationships for the task of temporal action localization in untrimmed videos. As opposed to previous works that leverage video-level co-occurrence of actions, we distinguish the relationships between actions that occur at the same time-step and actions that occur at different time-steps (i.e. those which precede or follow each other). We define these distinct relationships as action dependencies. We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer. The MLAD layer consists of two branches: a Co-occurrence Dependency Branch and a Temporal Dependency Branch to model co-occurrence action dependencies and temporal action dependencies, respectively. We observe that existing metrics used for multi-label classification do not explicitly measure how well action dependencies are modeled, therefore, we propose novel metrics that consider both co-occurrence and temporal dependencies between action classes. Through empirical evaluation and extensive analysis, we show improved performance over state-of-the-art methods on multi-label action localization benchmarks(MultiTHUMOS and Charades) in terms of f-mAP and our proposed metric.

Results

TaskDatasetMetricValueModel
VideoMultiTHUMOSAverage mAP14.2MLAD
Temporal Action LocalizationMultiTHUMOSAverage mAP14.2MLAD
Zero-Shot LearningMultiTHUMOSAverage mAP14.2MLAD
Action LocalizationMultiTHUMOSAverage mAP14.2MLAD
Action DetectionMulti-THUMOSmAP51.5MLAD
Action DetectionCharadesmAP23.7MLAD (RGB + Flow)

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Privacy-Preserving Chest X-ray Classification in Latent Space with Homomorphically Encrypted Neural Inference2025-06-18Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Explainable Detection of Implicit Influential Patterns in Conversations via Data Augmentation2025-06-17AgriPotential: A Novel Multi-Spectral and Multi-Temporal Remote Sensing Dataset for Agricultural Potentials2025-06-13