TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Two-Stream Adaptive Graph Convolutional Networks for Skele...

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu

2018-05-20CVPR 2019 63D Action RecognitionSkeleton Based Action RecognitionAction RecognitionTemporal Action Localizationgraph constructionVocal Bursts Valence Prediction
PaperPDFCodeCodeCodeCode

Abstract

In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.

Results

TaskDatasetMetricValueModel
VideoAssembly101Actions Top-126.72s-AGCN
VideoAssembly101Object Top-133.82s-AGCN
VideoAssembly101Verbs Top-164.42s-AGCN
VideoUAV-HumanCSv1(%)34.842S-AGCN
VideoUAV-HumanCSv2(%)66.682S-AGCN
VideoNTU RGB+DAccuracy (CS)88.52s-NLGCN
VideoNTU RGB+DAccuracy (CV)95.12s-NLGCN
Temporal Action LocalizationAssembly101Actions Top-126.72s-AGCN
Temporal Action LocalizationAssembly101Object Top-133.82s-AGCN
Temporal Action LocalizationAssembly101Verbs Top-164.42s-AGCN
Temporal Action LocalizationUAV-HumanCSv1(%)34.842S-AGCN
Temporal Action LocalizationUAV-HumanCSv2(%)66.682S-AGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)88.52s-NLGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)95.12s-NLGCN
Zero-Shot LearningAssembly101Actions Top-126.72s-AGCN
Zero-Shot LearningAssembly101Object Top-133.82s-AGCN
Zero-Shot LearningAssembly101Verbs Top-164.42s-AGCN
Zero-Shot LearningUAV-HumanCSv1(%)34.842S-AGCN
Zero-Shot LearningUAV-HumanCSv2(%)66.682S-AGCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)88.52s-NLGCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)95.12s-NLGCN
Activity RecognitionAssembly101Actions Top-126.72s-AGCN
Activity RecognitionAssembly101Object Top-133.82s-AGCN
Activity RecognitionAssembly101Verbs Top-164.42s-AGCN
Activity RecognitionUAV-HumanCSv1(%)34.842S-AGCN
Activity RecognitionUAV-HumanCSv2(%)66.682S-AGCN
Activity RecognitionNTU RGB+DAccuracy (CS)88.52s-NLGCN
Activity RecognitionNTU RGB+DAccuracy (CV)95.12s-NLGCN
Action LocalizationAssembly101Actions Top-126.72s-AGCN
Action LocalizationAssembly101Object Top-133.82s-AGCN
Action LocalizationAssembly101Verbs Top-164.42s-AGCN
Action LocalizationUAV-HumanCSv1(%)34.842S-AGCN
Action LocalizationUAV-HumanCSv2(%)66.682S-AGCN
Action LocalizationNTU RGB+DAccuracy (CS)88.52s-NLGCN
Action LocalizationNTU RGB+DAccuracy (CV)95.12s-NLGCN
Action DetectionUAV-HumanCSv1(%)34.842S-AGCN
Action DetectionUAV-HumanCSv2(%)66.682S-AGCN
Action DetectionNTU RGB+DAccuracy (CS)88.52s-NLGCN
Action DetectionNTU RGB+DAccuracy (CV)95.12s-NLGCN
3D Action RecognitionAssembly101Actions Top-126.72s-AGCN
3D Action RecognitionAssembly101Object Top-133.82s-AGCN
3D Action RecognitionAssembly101Verbs Top-164.42s-AGCN
3D Action RecognitionUAV-HumanCSv1(%)34.842S-AGCN
3D Action RecognitionUAV-HumanCSv2(%)66.682S-AGCN
3D Action RecognitionNTU RGB+DAccuracy (CS)88.52s-NLGCN
3D Action RecognitionNTU RGB+DAccuracy (CV)95.12s-NLGCN
Action RecognitionAssembly101Actions Top-126.72s-AGCN
Action RecognitionAssembly101Object Top-133.82s-AGCN
Action RecognitionAssembly101Verbs Top-164.42s-AGCN
Action RecognitionUAV-HumanCSv1(%)34.842S-AGCN
Action RecognitionUAV-HumanCSv2(%)66.682S-AGCN
Action RecognitionNTU RGB+DAccuracy (CS)88.52s-NLGCN
Action RecognitionNTU RGB+DAccuracy (CV)95.12s-NLGCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Efficiently Constructing Sparse Navigable Graphs2025-07-17NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25