TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/NTU RGB+D: A Large Scale Dataset for 3D Human Activity Ana...

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang

2016-04-11CVPR 2016 63D Action RecognitionAction ClassificationSkeleton Based Action RecognitionGeneral ClassificationAction Recognition
PaperPDFCodeCode(official)

Abstract

Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
VideoNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
VideoNTU RGB+DAccuracy (CS)60.7Deep LSTM
VideoNTU RGB+DAccuracy (CV)67.3Deep LSTM
Temporal Action LocalizationNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
Temporal Action LocalizationNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
Temporal Action LocalizationNTU RGB+DAccuracy (CS)60.7Deep LSTM
Temporal Action LocalizationNTU RGB+DAccuracy (CV)67.3Deep LSTM
Zero-Shot LearningNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
Zero-Shot LearningNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
Zero-Shot LearningNTU RGB+DAccuracy (CS)60.7Deep LSTM
Zero-Shot LearningNTU RGB+DAccuracy (CV)67.3Deep LSTM
Activity RecognitionNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
Activity RecognitionNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
Activity RecognitionNTU RGB+DAccuracy (CS)60.7Deep LSTM
Activity RecognitionNTU RGB+DAccuracy (CV)67.3Deep LSTM
Action LocalizationNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
Action LocalizationNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
Action LocalizationNTU RGB+DAccuracy (CS)60.7Deep LSTM
Action LocalizationNTU RGB+DAccuracy (CV)67.3Deep LSTM
Action DetectionNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
Action DetectionNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
Action DetectionNTU RGB+DAccuracy (CS)60.7Deep LSTM
Action DetectionNTU RGB+DAccuracy (CV)67.3Deep LSTM
3D Action RecognitionNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
3D Action RecognitionNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
3D Action RecognitionNTU RGB+DAccuracy (CS)60.7Deep LSTM
3D Action RecognitionNTU RGB+DAccuracy (CV)67.3Deep LSTM
Action RecognitionNTU RGB+DAccuracy (CS)62.93Part-aware LSTM
Action RecognitionNTU RGB+DAccuracy (CV)70.27Part-aware LSTM
Action RecognitionNTU RGB+DAccuracy (CS)60.7Deep LSTM
Action RecognitionNTU RGB+DAccuracy (CV)67.3Deep LSTM

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16