TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-b...

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition

Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

2022-10-12Skeleton Based Action RecognitionAction Recognition
PaperPDFCodeCode(official)Code

Abstract

Graph convolution networks (GCN) have been widely used in skeleton-based action recognition. We note that existing GCN-based approaches primarily rely on prescribed graphical structures (ie., a manually defined topology of skeleton joints), which limits their flexibility to capture complicated correlations between joints. To move beyond this limitation, we propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN). It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling. In particular, DG-GCN uses learned affinity matrices to capture dynamic graphical structures instead of relying on a prescribed one, while DG-TCN performs group-wise temporal convolutions with varying receptive fields and incorporates a dynamic joint-skeleton fusion module for adaptive multi-level temporal modeling. On a wide range of benchmarks, including NTURGB+D, Kinetics-Skeleton, BABEL, and Toyota SmartHome, DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
VideoNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
VideoNTU RGB+D 120Ensembled Modalities4DG-STGCN
VideoNTU RGB+DAccuracy (CS)93.2DG-STGCN
VideoNTU RGB+DAccuracy (CV)97.5DG-STGCN
VideoNTU RGB+DEnsembled Modalities4DG-STGCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities4DG-STGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)93.2DG-STGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.5DG-STGCN
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4DG-STGCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities4DG-STGCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)93.2DG-STGCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.5DG-STGCN
Zero-Shot LearningNTU RGB+DEnsembled Modalities4DG-STGCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
Activity RecognitionNTU RGB+D 120Ensembled Modalities4DG-STGCN
Activity RecognitionNTU RGB+DAccuracy (CS)93.2DG-STGCN
Activity RecognitionNTU RGB+DAccuracy (CV)97.5DG-STGCN
Activity RecognitionNTU RGB+DEnsembled Modalities4DG-STGCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
Action LocalizationNTU RGB+D 120Ensembled Modalities4DG-STGCN
Action LocalizationNTU RGB+DAccuracy (CS)93.2DG-STGCN
Action LocalizationNTU RGB+DAccuracy (CV)97.5DG-STGCN
Action LocalizationNTU RGB+DEnsembled Modalities4DG-STGCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
Action DetectionNTU RGB+D 120Ensembled Modalities4DG-STGCN
Action DetectionNTU RGB+DAccuracy (CS)93.2DG-STGCN
Action DetectionNTU RGB+DAccuracy (CV)97.5DG-STGCN
Action DetectionNTU RGB+DEnsembled Modalities4DG-STGCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
3D Action RecognitionNTU RGB+D 120Ensembled Modalities4DG-STGCN
3D Action RecognitionNTU RGB+DAccuracy (CS)93.2DG-STGCN
3D Action RecognitionNTU RGB+DAccuracy (CV)97.5DG-STGCN
3D Action RecognitionNTU RGB+DEnsembled Modalities4DG-STGCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.3DG-STGCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.6DG-STGCN
Action RecognitionNTU RGB+D 120Ensembled Modalities4DG-STGCN
Action RecognitionNTU RGB+DAccuracy (CS)93.2DG-STGCN
Action RecognitionNTU RGB+DAccuracy (CV)97.5DG-STGCN
Action RecognitionNTU RGB+DEnsembled Modalities4DG-STGCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16