TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Spatial Temporal Graph Attention Network for Skeleton-Base...

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

Lianyu Hu, Shenglan Liu, Wei Feng

2022-08-18Skeleton Based Action RecognitionAction RecognitionGraph Attention
PaperPDFCode(official)

Abstract

It's common for current methods in skeleton-based action recognition to mainly consider capturing long-term temporal dependencies as skeleton sequences are typically long (>128 frames), which forms a challenging problem for previous approaches. In such conditions, short-term dependencies are few formally considered, which are critical for classifying similar actions. Most current approaches are consisted of interleaving spatial-only modules and temporal-only modules, where direct information flow among joints in adjacent frames are hindered, thus inferior to capture short-term motion and distinguish similar action pairs. To handle this limitation, we propose a general framework, coined as STGAT, to model cross-spacetime information flow. It equips the spatial-only modules with spatial-temporal modeling for regional perception. While STGAT is theoretically effective for spatial-temporal modeling, we propose three simple modules to reduce local spatial-temporal feature redundancy and further release the potential of STGAT, which (1) narrow the scope of self-attention mechanism, (2) dynamically weight joints along temporal dimension, and (3) separate subtle motion from static features, respectively. As a robust feature extractor, STGAT generalizes better upon classifying similar actions than previous methods, witnessed by both qualitative and quantitative results. STGAT achieves state-of-the-art performance on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400. Code is released.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
VideoNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
VideoNTU RGB+D 120Ensembled Modalities4STGAT
VideoKinetics-400Actions Top-1 (S1)39.2STGAT
VideoNTU RGB+DAccuracy (CS)92.8STGAT
VideoNTU RGB+DAccuracy (CV)97.3STGAT
VideoNTU RGB+DEnsembled Modalities4STGAT
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities4STGAT
Temporal Action LocalizationKinetics-400Actions Top-1 (S1)39.2STGAT
Temporal Action LocalizationNTU RGB+DAccuracy (CS)92.8STGAT
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.3STGAT
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4STGAT
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities4STGAT
Zero-Shot LearningKinetics-400Actions Top-1 (S1)39.2STGAT
Zero-Shot LearningNTU RGB+DAccuracy (CS)92.8STGAT
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.3STGAT
Zero-Shot LearningNTU RGB+DEnsembled Modalities4STGAT
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
Activity RecognitionNTU RGB+D 120Ensembled Modalities4STGAT
Activity RecognitionKinetics-400Actions Top-1 (S1)39.2STGAT
Activity RecognitionNTU RGB+DAccuracy (CS)92.8STGAT
Activity RecognitionNTU RGB+DAccuracy (CV)97.3STGAT
Activity RecognitionNTU RGB+DEnsembled Modalities4STGAT
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
Action LocalizationNTU RGB+D 120Ensembled Modalities4STGAT
Action LocalizationKinetics-400Actions Top-1 (S1)39.2STGAT
Action LocalizationNTU RGB+DAccuracy (CS)92.8STGAT
Action LocalizationNTU RGB+DAccuracy (CV)97.3STGAT
Action LocalizationNTU RGB+DEnsembled Modalities4STGAT
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
Action DetectionNTU RGB+D 120Ensembled Modalities4STGAT
Action DetectionKinetics-400Actions Top-1 (S1)39.2STGAT
Action DetectionNTU RGB+DAccuracy (CS)92.8STGAT
Action DetectionNTU RGB+DAccuracy (CV)97.3STGAT
Action DetectionNTU RGB+DEnsembled Modalities4STGAT
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
3D Action RecognitionNTU RGB+D 120Ensembled Modalities4STGAT
3D Action RecognitionKinetics-400Actions Top-1 (S1)39.2STGAT
3D Action RecognitionNTU RGB+DAccuracy (CS)92.8STGAT
3D Action RecognitionNTU RGB+DAccuracy (CV)97.3STGAT
3D Action RecognitionNTU RGB+DEnsembled Modalities4STGAT
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.4STGAT
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.7STGAT
Action RecognitionNTU RGB+D 120Ensembled Modalities4STGAT
Action RecognitionKinetics-400Actions Top-1 (S1)39.2STGAT
Action RecognitionNTU RGB+DAccuracy (CS)92.8STGAT
Action RecognitionNTU RGB+DAccuracy (CV)97.3STGAT
Action RecognitionNTU RGB+DEnsembled Modalities4STGAT

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Catching Bid-rigging Cartels with Graph Attention Neural Networks2025-07-16Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting2025-07-14Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence2025-07-02Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25