TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Skeleton-based Action Recognition via Temporal-Channel Agg...

Skeleton-based Action Recognition via Temporal-Channel Aggregation

Shengqin Wang, Yongji Zhang, Minghao Zhao, Hong Qi, Kai Wang, Fenglin Wei, Yu Jiang

2022-05-31Skeleton Based Action RecognitionAction Recognition
PaperPDFCode(official)

Abstract

Skeleton-based action recognition methods are limited by the semantic extraction of spatio-temporal skeletal maps. However, current methods have difficulty in effectively combining features from both temporal and spatial graph dimensions and tend to be thick on one side and thin on the other. In this paper, we propose a Temporal-Channel Aggregation Graph Convolutional Networks (TCA-GCN) to learn spatial and temporal topologies dynamically and efficiently aggregate topological features in different temporal and channel dimensions for skeleton-based action recognition. We use the Temporal Aggregation module to learn temporal dimensional features and the Channel Aggregation module to efficiently combine spatial dynamic channel-wise topological features with temporal dynamic topological features. In addition, we extract multi-scale skeletal features on temporal modeling and fuse them with an attention mechanism. Extensive experiments show that our model results outperform state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
VideoNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
VideoNTU RGB+D 120Ensembled Modalities4TCA-GCN
VideoN-UCLAAccuracy97TCA-GCN
VideoNTU RGB+DAccuracy (CS)92.8TCA-GCN
VideoNTU RGB+DAccuracy (CV)97TCA-GCN
VideoNTU RGB+DEnsembled Modalities4TCA-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities4TCA-GCN
Temporal Action LocalizationN-UCLAAccuracy97TCA-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)92.8TCA-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97TCA-GCN
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4TCA-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities4TCA-GCN
Zero-Shot LearningN-UCLAAccuracy97TCA-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)92.8TCA-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)97TCA-GCN
Zero-Shot LearningNTU RGB+DEnsembled Modalities4TCA-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
Activity RecognitionNTU RGB+D 120Ensembled Modalities4TCA-GCN
Activity RecognitionN-UCLAAccuracy97TCA-GCN
Activity RecognitionNTU RGB+DAccuracy (CS)92.8TCA-GCN
Activity RecognitionNTU RGB+DAccuracy (CV)97TCA-GCN
Activity RecognitionNTU RGB+DEnsembled Modalities4TCA-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
Action LocalizationNTU RGB+D 120Ensembled Modalities4TCA-GCN
Action LocalizationN-UCLAAccuracy97TCA-GCN
Action LocalizationNTU RGB+DAccuracy (CS)92.8TCA-GCN
Action LocalizationNTU RGB+DAccuracy (CV)97TCA-GCN
Action LocalizationNTU RGB+DEnsembled Modalities4TCA-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
Action DetectionNTU RGB+D 120Ensembled Modalities4TCA-GCN
Action DetectionN-UCLAAccuracy97TCA-GCN
Action DetectionNTU RGB+DAccuracy (CS)92.8TCA-GCN
Action DetectionNTU RGB+DAccuracy (CV)97TCA-GCN
Action DetectionNTU RGB+DEnsembled Modalities4TCA-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
3D Action RecognitionNTU RGB+D 120Ensembled Modalities4TCA-GCN
3D Action RecognitionN-UCLAAccuracy97TCA-GCN
3D Action RecognitionNTU RGB+DAccuracy (CS)92.8TCA-GCN
3D Action RecognitionNTU RGB+DAccuracy (CV)97TCA-GCN
3D Action RecognitionNTU RGB+DEnsembled Modalities4TCA-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)90.8TCA-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.4TCA-GCN
Action RecognitionNTU RGB+D 120Ensembled Modalities4TCA-GCN
Action RecognitionN-UCLAAccuracy97TCA-GCN
Action RecognitionNTU RGB+DAccuracy (CS)92.8TCA-GCN
Action RecognitionNTU RGB+DAccuracy (CV)97TCA-GCN
Action RecognitionNTU RGB+DEnsembled Modalities4TCA-GCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16