TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Zero-shot Skeleton-based Action Recognition with Prototype...

Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

Kai Zhou, Shuhai Zhang, Zeng You, Jinwu Hu, Mingkui Tan, Fei Liu

2025-07-01One-Shot 3D Action RecognitionSkeleton Based Action RecognitionZero Shot Skeletal Action RecognitionTransfer LearningAction Recognition
PaperPDFCode(official)

Abstract

Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known to unknown actions. Previous studies typically use two-stage training: pre-training skeleton encoders on seen action categories using cross-entropy loss and then aligning pre-extracted skeleton and text features, enabling knowledge transfer to unseen classes through skeleton-text alignment and language models' generalization. However, their efficacy is hindered by 1) insufficient discrimination for skeleton features, as the fixed skeleton encoder fails to capture necessary alignment information for effective skeleton-text alignment; 2) the neglect of alignment bias between skeleton and unseen text features during testing. To this end, we propose a prototype-guided feature alignment paradigm for zero-shot skeleton-based action recognition, termed PGFA. Specifically, we develop an end-to-end cross-modal contrastive training framework to improve skeleton-text alignment, ensuring sufficient discrimination for skeleton features. Additionally, we introduce a prototype-guided text feature alignment strategy to mitigate the adverse impact of the distribution discrepancy during testing. We provide a theoretical analysis to support our prototype-guided text feature alignment strategy and empirically evaluate our overall PGFA on three well-known datasets. Compared with the top competitor SMIE method, our PGFA achieves absolute accuracy improvements of 22.96%, 12.53%, and 18.54% on the NTU-60, NTU-120, and PKU-MMD datasets, respectively.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
VideoNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
VideoNTU RGB+D 120Random Split Accuracy71.38PGFA
VideoPKU-MMDRandom Split Accuracy87.8PGFA
VideoNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
VideoNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
VideoNTU RGB+DRandom Split Accuracy93.17PGFA
Temporal Action LocalizationNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
Temporal Action LocalizationNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
Temporal Action LocalizationNTU RGB+D 120Random Split Accuracy71.38PGFA
Temporal Action LocalizationPKU-MMDRandom Split Accuracy87.8PGFA
Temporal Action LocalizationNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
Temporal Action LocalizationNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
Temporal Action LocalizationNTU RGB+DRandom Split Accuracy93.17PGFA
Zero-Shot LearningNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
Zero-Shot LearningNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
Zero-Shot LearningNTU RGB+D 120Random Split Accuracy71.38PGFA
Zero-Shot LearningPKU-MMDRandom Split Accuracy87.8PGFA
Zero-Shot LearningNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
Zero-Shot LearningNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
Zero-Shot LearningNTU RGB+DRandom Split Accuracy93.17PGFA
Activity RecognitionNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
Activity RecognitionNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
Activity RecognitionNTU RGB+D 120Random Split Accuracy71.38PGFA
Activity RecognitionPKU-MMDRandom Split Accuracy87.8PGFA
Activity RecognitionNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
Activity RecognitionNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
Activity RecognitionNTU RGB+DRandom Split Accuracy93.17PGFA
Action LocalizationNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
Action LocalizationNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
Action LocalizationNTU RGB+D 120Random Split Accuracy71.38PGFA
Action LocalizationPKU-MMDRandom Split Accuracy87.8PGFA
Action LocalizationNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
Action LocalizationNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
Action LocalizationNTU RGB+DRandom Split Accuracy93.17PGFA
3D Action RecognitionNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
3D Action RecognitionNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
3D Action RecognitionNTU RGB+D 120Random Split Accuracy71.38PGFA
3D Action RecognitionPKU-MMDRandom Split Accuracy87.8PGFA
3D Action RecognitionNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
3D Action RecognitionNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
3D Action RecognitionNTU RGB+DRandom Split Accuracy93.17PGFA
Action RecognitionNTU RGB+D 120Accuracy (10 unseen classes)79.99PGFA
Action RecognitionNTU RGB+D 120Accuracy (24 unseen classes)59.42PGFA
Action RecognitionNTU RGB+D 120Random Split Accuracy71.38PGFA
Action RecognitionPKU-MMDRandom Split Accuracy87.8PGFA
Action RecognitionNTU RGB+DAccuracy (12 unseen classes)55.99PGFA
Action RecognitionNTU RGB+DAccuracy (5 unseen classes)80.26PGFA
Action RecognitionNTU RGB+DRandom Split Accuracy93.17PGFA

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16Robust-Multi-Task Gradient Boosting2025-07-15Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift2025-07-12The Bayesian Approach to Continual Learning: An Overview2025-07-11Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol2025-07-08