TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Comparative Review of Recent Kinect-based Action Recogni...

A Comparative Review of Recent Kinect-based Action Recognition Algorithms

Lei Wang, Du. Q. Huynh, Piotr Koniusz

2019-06-24Skeleton Based Action RecognitionAction RecognitionTemporal Action Localization
PaperPDFCode(official)

Abstract

Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
VideoNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
VideoNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
VideoNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
Temporal Action LocalizationNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
Temporal Action LocalizationNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
Temporal Action LocalizationNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
Temporal Action LocalizationNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
Zero-Shot LearningNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
Zero-Shot LearningNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
Zero-Shot LearningNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
Zero-Shot LearningNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
Activity RecognitionNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
Activity RecognitionNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
Activity RecognitionNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
Activity RecognitionNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
Action LocalizationNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
Action LocalizationNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
Action LocalizationNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
Action LocalizationNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
Action DetectionNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
Action DetectionNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
Action DetectionNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
Action DetectionNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
3D Action RecognitionNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
3D Action RecognitionNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
3D Action RecognitionNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
3D Action RecognitionNTU RGB+DAccuracy (CV)89IndRNN (with jpd)
Action RecognitionNTU RGB+DAccuracy (CS)83.36ST-GCN-jpd
Action RecognitionNTU RGB+DAccuracy (CV)88.84ST-GCN-jpd
Action RecognitionNTU RGB+DAccuracy (CS)83IndRNN (with jpd)
Action RecognitionNTU RGB+DAccuracy (CV)89IndRNN (with jpd)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22