TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Skeleton-based Action Recognition with Convolutional Neura...

Skeleton-based Action Recognition with Convolutional Neural Networks

Chao Li, Qiaoyong Zhong, Di Xie, ShiLiang Pu

2017-04-25Action DetectionAction ClassificationSkeleton Based Action RecognitionGeneral ClassificationAction RecognitionTemporal Action Localization
PaperPDFCode(official)

Abstract

Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is designed to rearrange and select important skeleton joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy on validation set of the NTU RGB+D dataset. For action detection in untrimmed videos, we develop a window proposal network to extract temporal segment proposals, which are further classified within the same network. On the recent PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large margin.

Results

TaskDatasetMetricValueModel
VideoPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
VideoPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
VideoNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
VideoNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
Temporal Action LocalizationPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
Temporal Action LocalizationPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
Temporal Action LocalizationNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
Temporal Action LocalizationNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
Zero-Shot LearningPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
Zero-Shot LearningPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
Zero-Shot LearningNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
Zero-Shot LearningNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
Activity RecognitionPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
Activity RecognitionPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
Activity RecognitionNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
Activity RecognitionNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
Action LocalizationPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
Action LocalizationPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
Action LocalizationNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
Action LocalizationNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
Action DetectionPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
Action DetectionPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
Action DetectionNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
Action DetectionNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
3D Action RecognitionPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
3D Action RecognitionPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
3D Action RecognitionNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
3D Action RecognitionNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans
Action RecognitionPKU-MMDmAP@0.50 (CS)90.4Li et al. [[Li et al.2017b]]
Action RecognitionPKU-MMDmAP@0.50 (CV)93.7Li et al. [[Li et al.2017b]]
Action RecognitionNTU RGB+DAccuracy (CS)83.2CNN+Motion+Trans
Action RecognitionNTU RGB+DAccuracy (CV)89.3CNN+Motion+Trans

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25