TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dynamic GCN: Context-enriched Topology Learning for Skelet...

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Fanfan Ye, ShiLiang Pu, Qiaoyong Zhong, Chao Li, Di Xie, Huiming Tang

2020-07-29Skeleton Based Action RecognitionAction Recognition
PaperPDFCode(official)

Abstract

Graph Convolutional Networks (GCNs) have attracted increasing interests for the task of skeleton-based action recognition. The key lies in the design of the graph structure, which encodes skeleton topology information. In this paper, we propose Dynamic GCN, in which a novel convolutional neural network named Contextencoding Network (CeN) is introduced to learn skeleton topology automatically. In particular, when learning the dependency between two joints, contextual features from the rest joints are incorporated in a global manner. CeN is extremely lightweight yet effective, and can be embedded into a graph convolutional layer. By stacking multiple CeN-enabled graph convolutional layers, we build Dynamic GCN. Notably, as a merit of CeN, dynamic graph topologies are constructed for different input samples as well as graph convolutional layers of various depths. Besides, three alternative context modeling architectures are well explored, which may serve as a guideline for future research on graph topology learning. CeN brings only ~7% extra FLOPs for the baseline model, and Dynamic GCN achieves better performance with $2\times$~$4\times$ fewer FLOPs than existing methods. By further combining static physical body connections and motion modalities, we achieve state-of-the-art performance on three large-scale benchmarks, namely NTU-RGB+D, NTU-RGB+D 120 and Skeleton-Kinetics.

Results

TaskDatasetMetricValueModel
VideoKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
VideoNTU RGB+DAccuracy (CS)91.5Dynamic GCN
VideoNTU RGB+DAccuracy (CV)96Dynamic GCN
Temporal Action LocalizationKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)91.5Dynamic GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96Dynamic GCN
Zero-Shot LearningKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)91.5Dynamic GCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)96Dynamic GCN
Activity RecognitionKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
Activity RecognitionNTU RGB+DAccuracy (CS)91.5Dynamic GCN
Activity RecognitionNTU RGB+DAccuracy (CV)96Dynamic GCN
Action LocalizationKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
Action LocalizationNTU RGB+DAccuracy (CS)91.5Dynamic GCN
Action LocalizationNTU RGB+DAccuracy (CV)96Dynamic GCN
Action DetectionKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
Action DetectionNTU RGB+DAccuracy (CS)91.5Dynamic GCN
Action DetectionNTU RGB+DAccuracy (CV)96Dynamic GCN
3D Action RecognitionKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
3D Action RecognitionNTU RGB+DAccuracy (CS)91.5Dynamic GCN
3D Action RecognitionNTU RGB+DAccuracy (CV)96Dynamic GCN
Action RecognitionKinetics-Skeleton datasetAccuracy37.9Dynamic GCN
Action RecognitionNTU RGB+DAccuracy (CS)91.5Dynamic GCN
Action RecognitionNTU RGB+DAccuracy (CV)96Dynamic GCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16