TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PYSKL: Towards Good Practices for Skeleton Action Recognit...

PYSKL: Towards Good Practices for Skeleton Action Recognition

Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

2022-05-19Skeleton Based Action RecognitionAction Recognition
PaperPDFCode(official)

Abstract

We present PYSKL: an open-source toolbox for skeleton-based action recognition based on PyTorch. The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN. In contrast to existing open-source skeleton action recognition projects that include only one or two algorithms, PYSKL implements six different algorithms under a unified framework with both the latest and original good practices to ease the comparison of efficacy and efficiency. We also provide an original GCN-based skeleton action recognition model named ST-GCN++, which achieves competitive recognition performance without any complicated attention schemes, serving as a strong baseline. Meanwhile, PYSKL supports the training and testing of nine skeleton-based action recognition benchmarks and achieves state-of-the-art recognition performance on eight of them. To facilitate future research on skeleton action recognition, we also provide a large number of trained models and detailed benchmark results to give some insights. PYSKL is released at https://github.com/kennymckormick/pyskl and is actively maintained. We will update this report when we add new features or benchmarks. The current version corresponds to PYSKL v0.2.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
VideoNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
VideoNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
VideoNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
VideoNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
VideoNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
VideoNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
VideoNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
VideoNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Temporal Action LocalizationNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Temporal Action LocalizationNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
Temporal Action LocalizationNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Zero-Shot LearningNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
Zero-Shot LearningNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Zero-Shot LearningNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
Zero-Shot LearningNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
Zero-Shot LearningNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
Activity RecognitionNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Activity RecognitionNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
Activity RecognitionNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
Activity RecognitionNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Activity RecognitionNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
Activity RecognitionNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
Activity RecognitionNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
Action LocalizationNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Action LocalizationNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
Action LocalizationNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
Action LocalizationNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Action LocalizationNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
Action LocalizationNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
Action LocalizationNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
Action DetectionNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Action DetectionNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
Action DetectionNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
Action DetectionNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Action DetectionNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
Action DetectionNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
Action DetectionNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
3D Action RecognitionNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
3D Action RecognitionNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
3D Action RecognitionNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
3D Action RecognitionNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
3D Action RecognitionNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
3D Action RecognitionNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
3D Action RecognitionNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.8ST-GCN++ [PYSKL, 3D Skeleton]
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.6ST-GCN++ [PYSKL, 3D Skeleton]
Action RecognitionNTU RGB+D 120Ensembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Action RecognitionNTU RGB+DAccuracy (CS)92.6ST-GCN++ [PYSKL, 3D Skeleton]
Action RecognitionNTU RGB+DAccuracy (CV)97.4ST-GCN++ [PYSKL, 3D Skeleton]
Action RecognitionNTU RGB+DEnsembled Modalities4ST-GCN++ [PYSKL, 3D Skeleton]
Action RecognitionNTU RGB+DAccuracy (CS)91.4ST-GCN [PYSKL, 2D Skeleton]
Action RecognitionNTU RGB+DAccuracy (CV)98.3ST-GCN [PYSKL, 2D Skeleton]
Action RecognitionNTU RGB+DEnsembled Modalities4ST-GCN [PYSKL, 2D Skeleton]

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16