TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchically Decomposed Graph Convolutional Networks for...

Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition

Jungho Lee, Minhyeok Lee, Dogyoon Lee, Sangyoun Lee

2022-08-23ICCV 2023 1Skeleton Based Action RecognitionAction Recognition
PaperPDFCode(official)

Abstract

Graph convolutional networks (GCNs) are the most commonly used methods for skeleton-based action recognition and have achieved remarkable performance. Generating adjacency matrices with semantically meaningful edges is particularly important for this task, but extracting such edges is challenging problem. To solve this, we propose a hierarchically decomposed graph convolutional network (HD-GCN) architecture with a novel hierarchically decomposed graph (HD-Graph). The proposed HD-GCN effectively decomposes every joint node into several sets to extract major structurally adjacent and distant edges, and uses them to construct an HD-Graph containing those edges in the same semantic spaces of a human skeleton. In addition, we introduce an attention-guided hierarchy aggregation (A-HA) module to highlight the dominant hierarchical edge sets of the HD-Graph. Furthermore, we apply a new six-way ensemble method, which uses only joint and bone stream without any motion stream. The proposed model is evaluated and achieves state-of-the-art performance on four large, popular datasets. Finally, we demonstrate the effectiveness of our model with various comparative experiments.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
VideoNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
VideoNTU RGB+D 120Ensembled Modalities6HD-GCN
VideoKinetics-Skeleton datasetAccuracy40.9HD-GCN
VideoN-UCLAAccuracy97.2HD-GCN
VideoNTU RGB+DAccuracy (CS)93.4HD-GCN
VideoNTU RGB+DAccuracy (CV)97.2HD-GCN
VideoNTU RGB+DEnsembled Modalities6HD-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities6HD-GCN
Temporal Action LocalizationKinetics-Skeleton datasetAccuracy40.9HD-GCN
Temporal Action LocalizationN-UCLAAccuracy97.2HD-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)93.4HD-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.2HD-GCN
Temporal Action LocalizationNTU RGB+DEnsembled Modalities6HD-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities6HD-GCN
Zero-Shot LearningKinetics-Skeleton datasetAccuracy40.9HD-GCN
Zero-Shot LearningN-UCLAAccuracy97.2HD-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)93.4HD-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.2HD-GCN
Zero-Shot LearningNTU RGB+DEnsembled Modalities6HD-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
Activity RecognitionNTU RGB+D 120Ensembled Modalities6HD-GCN
Activity RecognitionKinetics-Skeleton datasetAccuracy40.9HD-GCN
Activity RecognitionN-UCLAAccuracy97.2HD-GCN
Activity RecognitionNTU RGB+DAccuracy (CS)93.4HD-GCN
Activity RecognitionNTU RGB+DAccuracy (CV)97.2HD-GCN
Activity RecognitionNTU RGB+DEnsembled Modalities6HD-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
Action LocalizationNTU RGB+D 120Ensembled Modalities6HD-GCN
Action LocalizationKinetics-Skeleton datasetAccuracy40.9HD-GCN
Action LocalizationN-UCLAAccuracy97.2HD-GCN
Action LocalizationNTU RGB+DAccuracy (CS)93.4HD-GCN
Action LocalizationNTU RGB+DAccuracy (CV)97.2HD-GCN
Action LocalizationNTU RGB+DEnsembled Modalities6HD-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
Action DetectionNTU RGB+D 120Ensembled Modalities6HD-GCN
Action DetectionKinetics-Skeleton datasetAccuracy40.9HD-GCN
Action DetectionN-UCLAAccuracy97.2HD-GCN
Action DetectionNTU RGB+DAccuracy (CS)93.4HD-GCN
Action DetectionNTU RGB+DAccuracy (CV)97.2HD-GCN
Action DetectionNTU RGB+DEnsembled Modalities6HD-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
3D Action RecognitionNTU RGB+D 120Ensembled Modalities6HD-GCN
3D Action RecognitionKinetics-Skeleton datasetAccuracy40.9HD-GCN
3D Action RecognitionN-UCLAAccuracy97.2HD-GCN
3D Action RecognitionNTU RGB+DAccuracy (CS)93.4HD-GCN
3D Action RecognitionNTU RGB+DAccuracy (CV)97.2HD-GCN
3D Action RecognitionNTU RGB+DEnsembled Modalities6HD-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.6HD-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)90.1HD-GCN
Action RecognitionNTU RGB+D 120Ensembled Modalities6HD-GCN
Action RecognitionKinetics-Skeleton datasetAccuracy40.9HD-GCN
Action RecognitionN-UCLAAccuracy97.2HD-GCN
Action RecognitionNTU RGB+DAccuracy (CS)93.4HD-GCN
Action RecognitionNTU RGB+DAccuracy (CV)97.2HD-GCN
Action RecognitionNTU RGB+DEnsembled Modalities6HD-GCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16