TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Action Capsules: Human Skeleton Action Recognition

Action Capsules: Human Skeleton Action Recognition

Ali Farajzadeh Bavil, Hamed Damirchi, Hamid D. Taghirad

2023-01-30Skeleton Based Action RecognitionAction RecognitionTemporal Action Localization
PaperPDF

Abstract

Due to the compact and rich high-level representations offered, skeleton-based human action recognition has recently become a highly active research topic. Previous studies have demonstrated that investigating joint relationships in spatial and temporal dimensions provides effective information critical to action recognition. However, effectively encoding global dependencies of joints during spatio-temporal feature extraction is still challenging. In this paper, we introduce Action Capsule which identifies action-related key joints by considering the latent correlation of joints in a skeleton sequence. We show that, during inference, our end-to-end network pays attention to a set of joints specific to each action, whose encoded spatio-temporal features are aggregated to recognize the action. Additionally, the use of multiple stages of action capsules enhances the ability of the network to classify similar actions. Consequently, our network outperforms the state-of-the-art approaches on the N-UCLA dataset and obtains competitive results on the NTURGBD dataset. This is while our approach has significantly lower computational requirements based on GFLOPs measurements.

Results

TaskDatasetMetricValueModel
VideoN-UCLAAccuracy97.3Action Capsules
VideoNTU RGB+DAccuracy (CS)90Action Capsules
VideoNTU RGB+DAccuracy (CV)96.3Action Capsules
Temporal Action LocalizationN-UCLAAccuracy97.3Action Capsules
Temporal Action LocalizationNTU RGB+DAccuracy (CS)90Action Capsules
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96.3Action Capsules
Zero-Shot LearningN-UCLAAccuracy97.3Action Capsules
Zero-Shot LearningNTU RGB+DAccuracy (CS)90Action Capsules
Zero-Shot LearningNTU RGB+DAccuracy (CV)96.3Action Capsules
Activity RecognitionN-UCLAAccuracy97.3Action Capsules
Activity RecognitionNTU RGB+DAccuracy (CS)90Action Capsules
Activity RecognitionNTU RGB+DAccuracy (CV)96.3Action Capsules
Action LocalizationN-UCLAAccuracy97.3Action Capsules
Action LocalizationNTU RGB+DAccuracy (CS)90Action Capsules
Action LocalizationNTU RGB+DAccuracy (CV)96.3Action Capsules
Action DetectionN-UCLAAccuracy97.3Action Capsules
Action DetectionNTU RGB+DAccuracy (CS)90Action Capsules
Action DetectionNTU RGB+DAccuracy (CV)96.3Action Capsules
3D Action RecognitionN-UCLAAccuracy97.3Action Capsules
3D Action RecognitionNTU RGB+DAccuracy (CS)90Action Capsules
3D Action RecognitionNTU RGB+DAccuracy (CV)96.3Action Capsules
Action RecognitionN-UCLAAccuracy97.3Action Capsules
Action RecognitionNTU RGB+DAccuracy (CS)90Action Capsules
Action RecognitionNTU RGB+DAccuracy (CV)96.3Action Capsules

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22