TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/NTU RGB+D

NTU RGB+D

RGB-DVideosCustom (research-only, non-commercial, attribution)Introduced 2016-01-01

NTU RGB+D is a large-scale dataset for RGB-D human action recognition. It involves 56,880 samples of 60 action classes collected from 40 subjects. The actions can be generally divided into three categories: 40 daily actions (e.g., drinking, eating, reading), nine health-related actions (e.g., sneezing, staggering, falling down), and 11 mutual actions (e.g., punching, kicking, hugging). These actions take place under 17 different scene conditions corresponding to 17 video sequences (i.e., S001–S017). The actions were captured using three cameras with different horizontal imaging viewpoints, namely, −45∘,0∘, and +45∘. Multi-modality information is provided for action characterization, including depth maps, 3D skeleton joint position, RGB frames, and infrared sequences. The performance evaluation is performed by a cross-subject test that split the 40 subjects into training and test groups, and by a cross-view test that employed one camera (+45∘) for testing, and the other two cameras for training.

Source: Action Recognition for Depth Video using Multi-view Dynamic Images

Benchmarks

3D Action Recognition/Cross Subject Accuracy3D Action Recognition/Cross View Accuracy3D Action Recognition/Accuracy (CS)3D Action Recognition/Accuracy (CV)3D Action Recognition/Ensembled Modalities3D Action Recognition/GFLOPs per pred3D Action Recognition/Accuracy (12 unseen classes)3D Action Recognition/Accuracy (5 unseen classes)3D Action Recognition/Random Split Accuracy3D Action Recognition/Harmonic Mean (5 unseen classes)3D Action Recognition/Harmonic Mean (12 unseen classes)3D Action Recognition/Random Split Harmonic MeanAction Detection/Accuracy (CS)Action Detection/Accuracy (CV)Action Detection/Ensembled ModalitiesAction Detection/GFLOPs per predAction Localization/Cross Subject AccuracyAction Localization/Cross View AccuracyAction Localization/Accuracy (CS)Action Localization/Accuracy (CV)Action Localization/Ensembled ModalitiesAction Localization/GFLOPs per predAction Localization/Accuracy (12 unseen classes)Action Localization/Accuracy (5 unseen classes)Action Localization/Random Split AccuracyAction Localization/Harmonic Mean (5 unseen classes)Action Localization/Harmonic Mean (12 unseen classes)Action Localization/Random Split Harmonic MeanAction Recognition/Accuracy (CS)Action Recognition/Accuracy (CV)Action Recognition/Cross Subject AccuracyAction Recognition/Cross View AccuracyAction Recognition/Ensembled ModalitiesAction Recognition/GFLOPs per predAction Recognition/Accuracy (12 unseen classes)Action Recognition/Accuracy (5 unseen classes)Action Recognition/Random Split AccuracyAction Recognition/Harmonic Mean (5 unseen classes)Action Recognition/Harmonic Mean (12 unseen classes)Action Recognition/Random Split Harmonic MeanAction Recognition In Videos/Accuracy (CS)Activity Recognition/Accuracy (CS)Activity Recognition/Accuracy (CV)Activity Recognition/Cross Subject AccuracyActivity Recognition/Cross View AccuracyActivity Recognition/Ensembled ModalitiesActivity Recognition/GFLOPs per predActivity Recognition/Accuracy (12 unseen classes)Activity Recognition/Accuracy (5 unseen classes)Activity Recognition/Random Split AccuracyActivity Recognition/Harmonic Mean (5 unseen classes)Activity Recognition/Harmonic Mean (12 unseen classes)Activity Recognition/Random Split Harmonic MeanActivity Recognition/FID (CS)Activity Recognition/FID (CV)Human Interaction Recognition/Accuracy (Cross-Subject)Human Interaction Recognition/Accuracy (Cross-View)Human action generation/FID (CS)Human action generation/FID (CV)Temporal Action Localization/Cross Subject AccuracyTemporal Action Localization/Cross View AccuracyTemporal Action Localization/Accuracy (CS)Temporal Action Localization/Accuracy (CV)Temporal Action Localization/Ensembled ModalitiesTemporal Action Localization/GFLOPs per predTemporal Action Localization/Accuracy (12 unseen classes)Temporal Action Localization/Accuracy (5 unseen classes)Temporal Action Localization/Random Split AccuracyTemporal Action Localization/Harmonic Mean (5 unseen classes)Temporal Action Localization/Harmonic Mean (12 unseen classes)Temporal Action Localization/Random Split Harmonic MeanVideo/Cross Subject AccuracyVideo/Cross View AccuracyVideo/Accuracy (CS)Video/Accuracy (CV)Video/Ensembled ModalitiesVideo/GFLOPs per predVideo/Accuracy (12 unseen classes)Video/Accuracy (5 unseen classes)Video/Random Split AccuracyVideo/Harmonic Mean (5 unseen classes)Video/Harmonic Mean (12 unseen classes)Video/Random Split Harmonic MeanZero-Shot Learning/Cross Subject AccuracyZero-Shot Learning/Cross View AccuracyZero-Shot Learning/Accuracy (CS)Zero-Shot Learning/Accuracy (CV)Zero-Shot Learning/Ensembled ModalitiesZero-Shot Learning/GFLOPs per predZero-Shot Learning/Accuracy (12 unseen classes)Zero-Shot Learning/Accuracy (5 unseen classes)Zero-Shot Learning/Random Split AccuracyZero-Shot Learning/Harmonic Mean (5 unseen classes)Zero-Shot Learning/Harmonic Mean (12 unseen classes)Zero-Shot Learning/Random Split Harmonic Mean

Related Benchmarks

NTU RGB+D 120/3D Action Recognition/Accuracy (10 unseen classes)NTU RGB+D 120/3D Action Recognition/Accuracy (24 unseen classes)NTU RGB+D 120/3D Action Recognition/Accuracy (Cross-Setup)NTU RGB+D 120/3D Action Recognition/Accuracy (Cross-Subject)NTU RGB+D 120/3D Action Recognition/Ensembled ModalitiesNTU RGB+D 120/3D Action Recognition/GFLOPS per predictionNTU RGB+D 120/3D Action Recognition/Harmonic Mean (10 unseen classes)NTU RGB+D 120/3D Action Recognition/Harmonic Mean (24 unseen classes)NTU RGB+D 120/3D Action Recognition/Random Split AccuracyNTU RGB+D 120/3D Action Recognition/Random Split Harmonic MeanNTU RGB+D 120/Action Detection/Accuracy (Cross-Setup)NTU RGB+D 120/Action Detection/Accuracy (Cross-Subject)NTU RGB+D 120/Action Detection/Ensembled ModalitiesNTU RGB+D 120/Action Detection/GFLOPS per predictionNTU RGB+D 120/Action Localization/Accuracy (10 unseen classes)NTU RGB+D 120/Action Localization/Accuracy (24 unseen classes)NTU RGB+D 120/Action Localization/Accuracy (Cross-Setup)NTU RGB+D 120/Action Localization/Accuracy (Cross-Subject)NTU RGB+D 120/Action Localization/Ensembled ModalitiesNTU RGB+D 120/Action Localization/GFLOPS per predictionNTU RGB+D 120/Action Localization/Harmonic Mean (10 unseen classes)NTU RGB+D 120/Action Localization/Harmonic Mean (24 unseen classes)NTU RGB+D 120/Action Localization/Random Split AccuracyNTU RGB+D 120/Action Localization/Random Split Harmonic MeanNTU RGB+D 120/Action Recognition/Accuracy (10 unseen classes)NTU RGB+D 120/Action Recognition/Accuracy (24 unseen classes)NTU RGB+D 120/Action Recognition/Accuracy (Cross-Setup)NTU RGB+D 120/Action Recognition/Accuracy (Cross-Subject)NTU RGB+D 120/Action Recognition/ClassifierNTU RGB+D 120/Action Recognition/EncoderNTU RGB+D 120/Action Recognition/Ensembled ModalitiesNTU RGB+D 120/Action Recognition/GFLOPS per predictionNTU RGB+D 120/Action Recognition/Harmonic Mean (10 unseen classes)NTU RGB+D 120/Action Recognition/Harmonic Mean (24 unseen classes)NTU RGB+D 120/Action Recognition/Random Split AccuracyNTU RGB+D 120/Action Recognition/Random Split Harmonic MeanNTU RGB+D 120/Action Recognition/xset (%)NTU RGB+D 120/Action Recognition/xsub (%)NTU RGB+D 120/Activity Recognition/Accuracy (10 unseen classes)NTU RGB+D 120/Activity Recognition/Accuracy (24 unseen classes)NTU RGB+D 120/Activity Recognition/Accuracy (Cross-Setup)NTU RGB+D 120/Activity Recognition/Accuracy (Cross-Subject)NTU RGB+D 120/Activity Recognition/ClassifierNTU RGB+D 120/Activity Recognition/EncoderNTU RGB+D 120/Activity Recognition/Ensembled ModalitiesNTU RGB+D 120/Activity Recognition/FID (CS)NTU RGB+D 120/Activity Recognition/FID (CV)NTU RGB+D 120/Activity Recognition/GFLOPS per predictionNTU RGB+D 120/Activity Recognition/Harmonic Mean (10 unseen classes)NTU RGB+D 120/Activity Recognition/Harmonic Mean (24 unseen classes)NTU RGB+D 120/Activity Recognition/Random Split AccuracyNTU RGB+D 120/Activity Recognition/Random Split Harmonic MeanNTU RGB+D 120/Activity Recognition/xset (%)NTU RGB+D 120/Activity Recognition/xsub (%)NTU RGB+D 120/Human Interaction Recognition/AccuracyNTU RGB+D 120/Human Interaction Recognition/Accuracy (Cross-Setup)NTU RGB+D 120/Human Interaction Recognition/Accuracy (Cross-Subject)NTU RGB+D 120/Human action generation/FID (CS)NTU RGB+D 120/Human action generation/FID (CV)NTU RGB+D 120/Temporal Action Localization/Accuracy (10 unseen classes)NTU RGB+D 120/Temporal Action Localization/Accuracy (24 unseen classes)NTU RGB+D 120/Temporal Action Localization/Accuracy (Cross-Setup)NTU RGB+D 120/Temporal Action Localization/Accuracy (Cross-Subject)NTU RGB+D 120/Temporal Action Localization/Ensembled ModalitiesNTU RGB+D 120/Temporal Action Localization/GFLOPS per predictionNTU RGB+D 120/Temporal Action Localization/Harmonic Mean (10 unseen classes)NTU RGB+D 120/Temporal Action Localization/Harmonic Mean (24 unseen classes)NTU RGB+D 120/Temporal Action Localization/Random Split AccuracyNTU RGB+D 120/Temporal Action Localization/Random Split Harmonic MeanNTU RGB+D 120/Video/Accuracy (10 unseen classes)NTU RGB+D 120/Video/Accuracy (24 unseen classes)NTU RGB+D 120/Video/Accuracy (Cross-Setup)NTU RGB+D 120/Video/Accuracy (Cross-Subject)NTU RGB+D 120/Video/Ensembled ModalitiesNTU RGB+D 120/Video/GFLOPS per predictionNTU RGB+D 120/Video/Harmonic Mean (10 unseen classes)NTU RGB+D 120/Video/Harmonic Mean (24 unseen classes)NTU RGB+D 120/Video/Random Split AccuracyNTU RGB+D 120/Video/Random Split Harmonic MeanNTU RGB+D 120/Zero-Shot Learning/Accuracy (10 unseen classes)NTU RGB+D 120/Zero-Shot Learning/Accuracy (24 unseen classes)NTU RGB+D 120/Zero-Shot Learning/Accuracy (Cross-Setup)NTU RGB+D 120/Zero-Shot Learning/Accuracy (Cross-Subject)NTU RGB+D 120/Zero-Shot Learning/Ensembled ModalitiesNTU RGB+D 120/Zero-Shot Learning/GFLOPS per predictionNTU RGB+D 120/Zero-Shot Learning/Harmonic Mean (10 unseen classes)NTU RGB+D 120/Zero-Shot Learning/Harmonic Mean (24 unseen classes)NTU RGB+D 120/Zero-Shot Learning/Random Split AccuracyNTU RGB+D 120/Zero-Shot Learning/Random Split Harmonic MeanNTU RGB+D 2D/Activity Recognition/MMDa (CS)NTU RGB+D 2D/Activity Recognition/MMDa (CV)NTU RGB+D 2D/Activity Recognition/MMDs (CS)NTU RGB+D 2D/Activity Recognition/MMDs (CV)NTU RGB+D 2D/Human action generation/MMDa (CS)NTU RGB+D 2D/Human action generation/MMDa (CV)NTU RGB+D 2D/Human action generation/MMDs (CS)NTU RGB+D 2D/Human action generation/MMDs (CV)

Statistics

Papers
476
Benchmarks
95

Links

Homepage

Tasks

3D Action RecognitionAction DetectionAction LocalizationAction RecognitionAction Recognition In VideosActivity RecognitionEarly Action PredictionGeneralized Zero Shot skeletal action recognitionHuman Interaction RecognitionHuman action generationPose PredictionSelf-supervised Skeleton-based Action RecognitionSkeleton Based Action RecognitionTemporal Action LocalizationUnsupervised Skeleton Based Action RecognitionVideoZero Shot Skeletal Action RecognitionZero-Shot Learning