TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Skeleton-Based Action Recognition with Multi-Stream Adapti...

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks

Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu

2019-12-15Skeleton Based Action RecognitionAction RecognitionTemporal Action Localizationgraph construction
PaperPDFCodeCode(official)

Abstract

Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.

Results

TaskDatasetMetricValueModel
VideoKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
VideoKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
VideoNTU RGB+DAccuracy (CS)90MS-AAGCN
VideoNTU RGB+DAccuracy (CV)96.2MS-AAGCN
VideoNTU RGB+DAccuracy (CS)89.4JB-AAGCN
VideoNTU RGB+DAccuracy (CV)96JB-AAGCN
Temporal Action LocalizationKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
Temporal Action LocalizationKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)90MS-AAGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96.2MS-AAGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)89.4JB-AAGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96JB-AAGCN
Zero-Shot LearningKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
Zero-Shot LearningKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)90MS-AAGCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)96.2MS-AAGCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)89.4JB-AAGCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)96JB-AAGCN
Activity RecognitionKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
Activity RecognitionKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
Activity RecognitionNTU RGB+DAccuracy (CS)90MS-AAGCN
Activity RecognitionNTU RGB+DAccuracy (CV)96.2MS-AAGCN
Activity RecognitionNTU RGB+DAccuracy (CS)89.4JB-AAGCN
Activity RecognitionNTU RGB+DAccuracy (CV)96JB-AAGCN
Action LocalizationKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
Action LocalizationKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
Action LocalizationNTU RGB+DAccuracy (CS)90MS-AAGCN
Action LocalizationNTU RGB+DAccuracy (CV)96.2MS-AAGCN
Action LocalizationNTU RGB+DAccuracy (CS)89.4JB-AAGCN
Action LocalizationNTU RGB+DAccuracy (CV)96JB-AAGCN
Action DetectionKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
Action DetectionKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
Action DetectionNTU RGB+DAccuracy (CS)90MS-AAGCN
Action DetectionNTU RGB+DAccuracy (CV)96.2MS-AAGCN
Action DetectionNTU RGB+DAccuracy (CS)89.4JB-AAGCN
Action DetectionNTU RGB+DAccuracy (CV)96JB-AAGCN
3D Action RecognitionKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
3D Action RecognitionKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
3D Action RecognitionNTU RGB+DAccuracy (CS)90MS-AAGCN
3D Action RecognitionNTU RGB+DAccuracy (CV)96.2MS-AAGCN
3D Action RecognitionNTU RGB+DAccuracy (CS)89.4JB-AAGCN
3D Action RecognitionNTU RGB+DAccuracy (CV)96JB-AAGCN
Action RecognitionKinetics-Skeleton datasetAccuracy37.8MS-AAGCN
Action RecognitionKinetics-Skeleton datasetAccuracy37.4JB-AAGCN
Action RecognitionNTU RGB+DAccuracy (CS)90MS-AAGCN
Action RecognitionNTU RGB+DAccuracy (CV)96.2MS-AAGCN
Action RecognitionNTU RGB+DAccuracy (CS)89.4JB-AAGCN
Action RecognitionNTU RGB+DAccuracy (CV)96JB-AAGCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Efficiently Constructing Sparse Navigable Graphs2025-07-17NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25