TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition wi...

DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling

Hu Cui, Renjing Huang, Ruoyu Zhang, Tessai Hayama

2025-01-21Skeleton Based Action RecognitionGesture RecognitionHand Gesture RecognitionAction Recognition
PaperPDFCode(official)

Abstract

Graph convolutional networks (GCNs) have emerged as a powerful tool for skeleton-based action and gesture recognition, thanks to their ability to model spatial and temporal dependencies in skeleton data. However, existing GCN-based methods face critical limitations: (1) they lack effective spatio-temporal topology modeling that captures dynamic variations in skeletal motion, and (2) they struggle to model multiscale structural relationships beyond local joint connectivity. To address these issues, we propose a novel framework called Dynamic Spatial-Temporal Semantic Awareness Graph Convolutional Network (DSTSA-GCN). DSTSA-GCN introduces three key modules: Group Channel-wise Graph Convolution (GC-GC), Group Temporal-wise Graph Convolution (GT-GC), and Multi-Scale Temporal Convolution (MS-TCN). GC-GC and GT-GC operate in parallel to independently model channel-specific and frame-specific correlations, enabling robust topology learning that accounts for temporal variations. Additionally, both modules employ a grouping strategy to adaptively capture multiscale structural relationships. Complementing this, MS-TCN enhances temporal modeling through group-wise temporal convolutions with diverse receptive fields. Extensive experiments demonstrate that DSTSA-GCN significantly improves the topology modeling capabilities of GCNs, achieving state-of-the-art performance on benchmark datasets for gesture and action recognition, including SHREC17 Track, DHG-14\/28, NTU-RGB+D, and NTU-RGB+D-120.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
VideoNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
VideoNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
VideoSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
VideoSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
VideoN-UCLAAccuracy96.98DSTSA-GCN
VideoNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
VideoNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
VideoNTU RGB+DEnsembled Modalities4DSTSA-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
Temporal Action LocalizationSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
Temporal Action LocalizationSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
Temporal Action LocalizationN-UCLAAccuracy96.98DSTSA-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4DSTSA-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
Zero-Shot LearningSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
Zero-Shot LearningSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
Zero-Shot LearningN-UCLAAccuracy96.98DSTSA-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Zero-Shot LearningNTU RGB+DEnsembled Modalities4DSTSA-GCN
Activity RecognitionNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Activity RecognitionNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Activity RecognitionNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
Activity RecognitionSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
Activity RecognitionSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
Activity RecognitionN-UCLAAccuracy96.98DSTSA-GCN
Activity RecognitionNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Activity RecognitionNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Activity RecognitionNTU RGB+DEnsembled Modalities4DSTSA-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Action LocalizationNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
Action LocalizationSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
Action LocalizationSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
Action LocalizationN-UCLAAccuracy96.98DSTSA-GCN
Action LocalizationNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Action LocalizationNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Action LocalizationNTU RGB+DEnsembled Modalities4DSTSA-GCN
HandDHG-28Accuracy93.57DSTSA-GCN
HandDHG-14Accuracy95.04DSTSA-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Action DetectionNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
Action DetectionSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
Action DetectionSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
Action DetectionN-UCLAAccuracy96.98DSTSA-GCN
Action DetectionNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Action DetectionNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Action DetectionNTU RGB+DEnsembled Modalities4DSTSA-GCN
Gesture RecognitionDHG-28Accuracy93.57DSTSA-GCN
Gesture RecognitionDHG-14Accuracy95.04DSTSA-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
3D Action RecognitionNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
3D Action RecognitionSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
3D Action RecognitionSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
3D Action RecognitionN-UCLAAccuracy96.98DSTSA-GCN
3D Action RecognitionNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
3D Action RecognitionNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
3D Action RecognitionNTU RGB+DEnsembled Modalities4DSTSA-GCN
Action RecognitionNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Action RecognitionNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.97DSTSA-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.12DSTSA-GCN
Action RecognitionNTU RGB+D 120Ensembled Modalities4DSTSA-GCN
Action RecognitionSHREC 2017 track on 3D Hand Gesture Recognition14 gestures accuracy97.74DSTSA-GCN
Action RecognitionSHREC 2017 track on 3D Hand Gesture Recognition28 gestures accuracy95.37DSTSA-GCN
Action RecognitionN-UCLAAccuracy96.98DSTSA-GCN
Action RecognitionNTU RGB+DAccuracy (CS)92.78DSTSA-GCN
Action RecognitionNTU RGB+DAccuracy (CV)97.03DSTSA-GCN
Action RecognitionNTU RGB+DEnsembled Modalities4DSTSA-GCN

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions2025-07-06Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction?2025-06-25Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25