TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Constructing Stronger and Faster Baselines for Skeleton-ba...

Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition

Yi-Fan Song, Zhang Zhang, Caifeng Shan, Liang Wang

2021-06-29Skeleton Based Action RecognitionAction Recognition
PaperPDFCode(official)CodeCodeCode

Abstract

One essential problem in skeleton-based action recognition is how to extract discriminative features over all skeleton joints. However, the complexity of the recent State-Of-The-Art (SOTA) models for this task tends to be exceedingly sophisticated and over-parameterized. The low efficiency in model training and inference has increased the validation costs of model architectures in large-scale datasets. To address the above issue, recent advanced separable convolutional layers are embedded into an early fused Multiple Input Branches (MIB) network, constructing an efficient Graph Convolutional Network (GCN) baseline for skeleton-based action recognition. In addition, based on such the baseline, we design a compound scaling strategy to expand the model's width and depth synchronously, and eventually obtain a family of efficient GCN baselines with high accuracies and small amounts of trainable parameters, termed EfficientGCN-Bx, where "x" denotes the scaling coefficient. On two large-scale datasets, i.e., NTU RGB+D 60 and 120, the proposed EfficientGCN-B4 baseline outperforms other SOTA methods, e.g., achieving 91.7% accuracy on the cross-subject benchmark of NTU 60 dataset, while being 3.15x smaller and 3.21x faster than MS-G3D, which is one of the best SOTA methods. The source code in PyTorch version and the pretrained models are available at https://github.com/yfsong0709/EfficientGCNv1.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
VideoNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
VideoNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
VideoNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
VideoNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
VideoNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
VideoNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
VideoNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
VideoNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
VideoNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
VideoNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
VideoNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
Temporal Action LocalizationNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
Temporal Action LocalizationNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
Temporal Action LocalizationNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
Temporal Action LocalizationNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
Temporal Action LocalizationNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
Zero-Shot LearningNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
Zero-Shot LearningNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
Zero-Shot LearningNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
Zero-Shot LearningNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
Zero-Shot LearningNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
Zero-Shot LearningNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
Activity RecognitionNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
Activity RecognitionNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
Activity RecognitionNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
Activity RecognitionNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
Activity RecognitionNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
Activity RecognitionNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
Action LocalizationNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
Action LocalizationNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
Action LocalizationNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
Action LocalizationNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
Action LocalizationNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
Action LocalizationNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
Action DetectionNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
Action DetectionNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
Action DetectionNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
Action DetectionNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
Action DetectionNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
Action DetectionNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
3D Action RecognitionNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
3D Action RecognitionNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
3D Action RecognitionNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
3D Action RecognitionNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
3D Action RecognitionNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
3D Action RecognitionNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)89.1EfficientGCN-B4
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)88.7EfficientGCN-B4
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)88EfficientGCN-B2
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)87.9EfficientGCN-B2
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)84.3EfficientGCN-B0
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)85.9EfficientGCN-B0
Action RecognitionNTU RGB+DAccuracy (CS)92.1EfficientGCN-B4
Action RecognitionNTU RGB+DAccuracy (CV)96.1EfficientGCN-B4
Action RecognitionNTU RGB+DAccuracy (CS)90.9EfficientGCN-B2
Action RecognitionNTU RGB+DAccuracy (CV)95.5EfficientGCN-B2
Action RecognitionNTU RGB+DAccuracy (CS)89.9EfficientGCN-B0
Action RecognitionNTU RGB+DAccuracy (CV)94.7EfficientGCN-B0

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16