TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/NTU-X: An Enhanced Large-scale Dataset for Improving Pose-...

NTU-X: An Enhanced Large-scale Dataset for Improving Pose-based Recognition of Subtle Human Actions

Neel Trivedi, Anirudh Thatipelli, Ravi Kiran Sarvadevabhatla

2021-01-27Skeleton Based Action RecognitionAction Recognition
PaperPDFCode(official)

Abstract

The lack of fine-grained joints (facial joints, hand fingers) is a fundamental performance bottleneck for state of the art skeleton action recognition models. Despite this bottleneck, community's efforts seem to be invested only in coming up with novel architectures. To specifically address this bottleneck, we introduce two new pose based human action datasets - NTU60-X and NTU120-X. Our datasets extend the largest existing action recognition dataset, NTU-RGBD. In addition to the 25 body joints for each skeleton as in NTU-RGBD, NTU60-X and NTU120-X dataset includes finger and facial joints, enabling a richer skeleton representation. We appropriately modify the state of the art approaches to enable training using the introduced datasets. Our results demonstrate the effectiveness of these NTU-X datasets in overcoming the aforementioned bottleneck and improve state of the art performance, overall and on previously worst performing action categories. Code and pretrained models can be found at https://github.com/skelemoa/ntu-x .

Results

TaskDatasetMetricValueModel
VideoNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
VideoNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
VideoNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
VideoNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
VideoNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
VideoNTU60-XAccuracy (Body joints)91.26MS-G3D
VideoNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
VideoNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
VideoNTU60-XAccuracy (Body joints)89.98PA-ResGCN
Temporal Action LocalizationNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
Temporal Action LocalizationNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
Temporal Action LocalizationNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
Temporal Action LocalizationNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
Temporal Action LocalizationNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
Temporal Action LocalizationNTU60-XAccuracy (Body joints)91.26MS-G3D
Temporal Action LocalizationNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
Temporal Action LocalizationNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
Temporal Action LocalizationNTU60-XAccuracy (Body joints)89.98PA-ResGCN
Zero-Shot LearningNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
Zero-Shot LearningNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
Zero-Shot LearningNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
Zero-Shot LearningNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
Zero-Shot LearningNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
Zero-Shot LearningNTU60-XAccuracy (Body joints)91.26MS-G3D
Zero-Shot LearningNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
Zero-Shot LearningNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
Zero-Shot LearningNTU60-XAccuracy (Body joints)89.98PA-ResGCN
Activity RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
Activity RecognitionNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
Activity RecognitionNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
Activity RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
Activity RecognitionNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
Activity RecognitionNTU60-XAccuracy (Body joints)91.26MS-G3D
Activity RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
Activity RecognitionNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
Activity RecognitionNTU60-XAccuracy (Body joints)89.98PA-ResGCN
Action LocalizationNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
Action LocalizationNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
Action LocalizationNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
Action LocalizationNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
Action LocalizationNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
Action LocalizationNTU60-XAccuracy (Body joints)91.26MS-G3D
Action LocalizationNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
Action LocalizationNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
Action LocalizationNTU60-XAccuracy (Body joints)89.98PA-ResGCN
Action DetectionNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
Action DetectionNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
Action DetectionNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
Action DetectionNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
Action DetectionNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
Action DetectionNTU60-XAccuracy (Body joints)91.26MS-G3D
Action DetectionNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
Action DetectionNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
Action DetectionNTU60-XAccuracy (Body joints)89.98PA-ResGCN
3D Action RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
3D Action RecognitionNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
3D Action RecognitionNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
3D Action RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
3D Action RecognitionNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
3D Action RecognitionNTU60-XAccuracy (Body joints)91.26MS-G3D
3D Action RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
3D Action RecognitionNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
3D Action RecognitionNTU60-XAccuracy (Body joints)89.98PA-ResGCN
Action RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)89.644s-ShiftGCN
Action RecognitionNTU60-XAccuracy (Body + Fingers joints)91.784s-ShiftGCN
Action RecognitionNTU60-XAccuracy (Body joints)89.564s-ShiftGCN
Action RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)91.12MS-G3D
Action RecognitionNTU60-XAccuracy (Body + Fingers joints)91.76MS-G3D
Action RecognitionNTU60-XAccuracy (Body joints)91.26MS-G3D
Action RecognitionNTU60-XAccuracy (Body + Fingers + Face joints)89.79PA-ResGCN
Action RecognitionNTU60-XAccuracy (Body + Fingers joints)91.64PA-ResGCN
Action RecognitionNTU60-XAccuracy (Body joints)89.98PA-ResGCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16