TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Revisiting Skeleton-based Action Recognition

Revisiting Skeleton-based Action Recognition

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai

2021-04-28CVPR 2022 13D Action RecognitionSkeleton Based Action RecognitionPose EstimationGroup Activity RecognitionAction Recognition
PaperPDFCodeCodeCodeCode

Abstract

Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features on top of human skeletons. Despite the positive results shown in previous works, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseC3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseC3D can handle multiple-person scenarios without additional computation cost, and its features can be easily integrated with other modalities at early fusion stages, which provides a great design space to further boost the performance. On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.

Results

TaskDatasetMetricValueModel
VideoAssembly101Actions Top-133.61RGBPoseConv3D
VideoAssembly101Object Top-142.9RGBPoseConv3D
VideoAssembly101Verbs Top-161.99RGBPoseConv3D
VideoNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
VideoNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
VideoKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
VideoKinetics-Skeleton datasetAccuracy47.7PoseC3D
VideoNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
VideoNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
VideoNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
Temporal Action LocalizationAssembly101Actions Top-133.61RGBPoseConv3D
Temporal Action LocalizationAssembly101Object Top-142.9RGBPoseConv3D
Temporal Action LocalizationAssembly101Verbs Top-161.99RGBPoseConv3D
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
Temporal Action LocalizationKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
Temporal Action LocalizationKinetics-Skeleton datasetAccuracy47.7PoseC3D
Temporal Action LocalizationNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
Temporal Action LocalizationNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
Zero-Shot LearningAssembly101Actions Top-133.61RGBPoseConv3D
Zero-Shot LearningAssembly101Object Top-142.9RGBPoseConv3D
Zero-Shot LearningAssembly101Verbs Top-161.99RGBPoseConv3D
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
Zero-Shot LearningKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
Zero-Shot LearningKinetics-Skeleton datasetAccuracy47.7PoseC3D
Zero-Shot LearningNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
Zero-Shot LearningNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
Activity RecognitionVolleyballAccuracy91.3PoseC3D (Pose Only)
Activity RecognitionNTU RGB+DAccuracy (CS)97PoseC3D (RGB + Pose)
Activity RecognitionNTU RGB+DAccuracy (CV)99.6PoseC3D (RGB + Pose)
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)96.4PoseC3D (RGB + Pose)
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)95.3PoseC3D (RGB + Pose)
Activity RecognitionH2O (2 Hands and Objects)Actions Top-183.47RGBPoseConv3D
Activity RecognitionAssembly101Actions Top-133.61RGBPoseConv3D
Activity RecognitionAssembly101Object Top-142.9RGBPoseConv3D
Activity RecognitionAssembly101Verbs Top-161.99RGBPoseConv3D
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
Activity RecognitionKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
Activity RecognitionKinetics-Skeleton datasetAccuracy47.7PoseC3D
Activity RecognitionNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
Activity RecognitionNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
Activity RecognitionNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
Activity RecognitionVolleyballAccuracy91.3PoseC3D (Pose-Only)
Action LocalizationAssembly101Actions Top-133.61RGBPoseConv3D
Action LocalizationAssembly101Object Top-142.9RGBPoseConv3D
Action LocalizationAssembly101Verbs Top-161.99RGBPoseConv3D
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
Action LocalizationKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
Action LocalizationKinetics-Skeleton datasetAccuracy47.7PoseC3D
Action LocalizationNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
Action LocalizationNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
Action LocalizationNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
Action DetectionKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
Action DetectionKinetics-Skeleton datasetAccuracy47.7PoseC3D
Action DetectionNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
Action DetectionNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
Action DetectionNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
3D Action RecognitionAssembly101Actions Top-133.61RGBPoseConv3D
3D Action RecognitionAssembly101Object Top-142.9RGBPoseConv3D
3D Action RecognitionAssembly101Verbs Top-161.99RGBPoseConv3D
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
3D Action RecognitionKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
3D Action RecognitionKinetics-Skeleton datasetAccuracy47.7PoseC3D
3D Action RecognitionNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
3D Action RecognitionNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
3D Action RecognitionNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]
Action RecognitionVolleyballAccuracy91.3PoseC3D (Pose Only)
Action RecognitionNTU RGB+DAccuracy (CS)97PoseC3D (RGB + Pose)
Action RecognitionNTU RGB+DAccuracy (CV)99.6PoseC3D (RGB + Pose)
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)96.4PoseC3D (RGB + Pose)
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)95.3PoseC3D (RGB + Pose)
Action RecognitionH2O (2 Hands and Objects)Actions Top-183.47RGBPoseConv3D
Action RecognitionAssembly101Actions Top-133.61RGBPoseConv3D
Action RecognitionAssembly101Object Top-142.9RGBPoseConv3D
Action RecognitionAssembly101Verbs Top-161.99RGBPoseConv3D
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.3PoseC3D (w. HRNet 2D Skeleton)
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)86.9PoseC3D (w. HRNet 2D Skeleton)
Action RecognitionKinetics-Skeleton datasetAccuracy49.1PoseC3D (SlowOnly-346)
Action RecognitionKinetics-Skeleton datasetAccuracy47.7PoseC3D
Action RecognitionNTU RGB+DAccuracy (CS)94.1PoseC3D [3D Heatmap]
Action RecognitionNTU RGB+DAccuracy (CV)97.1PoseC3D [3D Heatmap]
Action RecognitionNTU RGB+DEnsembled Modalities2PoseC3D [3D Heatmap]

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16