TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/NTU RGB+D 120

NTU RGB+D 120

VideosCustom (research-only)

NTU RGB+D 120 is a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities.

Source: NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Benchmarks

3D Action Recognition/Accuracy (Cross-Subject)3D Action Recognition/Accuracy (Cross-Setup)3D Action Recognition/Ensembled Modalities3D Action Recognition/GFLOPS per prediction3D Action Recognition/Accuracy (10 unseen classes)3D Action Recognition/Accuracy (24 unseen classes)3D Action Recognition/Random Split Accuracy3D Action Recognition/Harmonic Mean (10 unseen classes)3D Action Recognition/Harmonic Mean (24 unseen classes)3D Action Recognition/Random Split Harmonic MeanAction Detection/Accuracy (Cross-Subject)Action Detection/Accuracy (Cross-Setup)Action Detection/Ensembled ModalitiesAction Detection/GFLOPS per predictionAction Localization/Accuracy (Cross-Subject)Action Localization/Accuracy (Cross-Setup)Action Localization/Ensembled ModalitiesAction Localization/GFLOPS per predictionAction Localization/Accuracy (10 unseen classes)Action Localization/Accuracy (24 unseen classes)Action Localization/Random Split AccuracyAction Localization/Harmonic Mean (10 unseen classes)Action Localization/Harmonic Mean (24 unseen classes)Action Localization/Random Split Harmonic MeanAction Recognition/Accuracy (Cross-Setup)Action Recognition/Accuracy (Cross-Subject)Action Recognition/Ensembled ModalitiesAction Recognition/GFLOPS per predictionAction Recognition/Accuracy (10 unseen classes)Action Recognition/Accuracy (24 unseen classes)Action Recognition/Random Split AccuracyAction Recognition/Harmonic Mean (10 unseen classes)Action Recognition/Harmonic Mean (24 unseen classes)Action Recognition/Random Split Harmonic MeanAction Recognition/xsub (%)Action Recognition/xset (%)Action Recognition/EncoderAction Recognition/ClassifierActivity Recognition/Accuracy (Cross-Setup)Activity Recognition/Accuracy (Cross-Subject)Activity Recognition/Ensembled ModalitiesActivity Recognition/GFLOPS per predictionActivity Recognition/Accuracy (10 unseen classes)Activity Recognition/Accuracy (24 unseen classes)Activity Recognition/Random Split AccuracyActivity Recognition/Harmonic Mean (10 unseen classes)Activity Recognition/Harmonic Mean (24 unseen classes)Activity Recognition/Random Split Harmonic MeanActivity Recognition/xsub (%)Activity Recognition/xset (%)Activity Recognition/EncoderActivity Recognition/ClassifierActivity Recognition/FID (CS)Activity Recognition/FID (CV)Human Interaction Recognition/Accuracy (Cross-Setup)Human Interaction Recognition/Accuracy (Cross-Subject)Human Interaction Recognition/AccuracyHuman action generation/FID (CS)Human action generation/FID (CV)Temporal Action Localization/Accuracy (Cross-Subject)Temporal Action Localization/Accuracy (Cross-Setup)Temporal Action Localization/Ensembled ModalitiesTemporal Action Localization/GFLOPS per predictionTemporal Action Localization/Accuracy (10 unseen classes)Temporal Action Localization/Accuracy (24 unseen classes)Temporal Action Localization/Random Split AccuracyTemporal Action Localization/Harmonic Mean (10 unseen classes)Temporal Action Localization/Harmonic Mean (24 unseen classes)Temporal Action Localization/Random Split Harmonic MeanVideo/Accuracy (Cross-Subject)Video/Accuracy (Cross-Setup)Video/Ensembled ModalitiesVideo/GFLOPS per predictionVideo/Accuracy (10 unseen classes)Video/Accuracy (24 unseen classes)Video/Random Split AccuracyVideo/Harmonic Mean (10 unseen classes)Video/Harmonic Mean (24 unseen classes)Video/Random Split Harmonic MeanZero-Shot Learning/Accuracy (Cross-Subject)Zero-Shot Learning/Accuracy (Cross-Setup)Zero-Shot Learning/Ensembled ModalitiesZero-Shot Learning/GFLOPS per predictionZero-Shot Learning/Accuracy (10 unseen classes)Zero-Shot Learning/Accuracy (24 unseen classes)Zero-Shot Learning/Random Split AccuracyZero-Shot Learning/Harmonic Mean (10 unseen classes)Zero-Shot Learning/Harmonic Mean (24 unseen classes)Zero-Shot Learning/Random Split Harmonic Mean

Statistics

Papers
137
Benchmarks
89

Links

Homepage

Tasks

3D Action RecognitionAction DetectionAction LocalizationAction RecognitionActivity RecognitionFew-Shot Skeleton-Based Action RecognitionGeneralized Zero Shot skeletal action recognitionHuman Interaction RecognitionHuman action generationOne-Shot 3D Action RecognitionSelf-Supervised Human Action RecognitionSelf-supervised Skeleton-based Action RecognitionSkeleton Based Action RecognitionTemporal Action LocalizationUnsupervised Skeleton Based Action RecognitionVideoZero Shot Skeletal Action RecognitionZero-Shot Learning