TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Syntactically Guided Generative Embeddings for Zero-Shot S...

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition

Pranay Gupta, Divyanshu Sharma, Ravi Kiran Sarvadevabhatla

2021-01-27Generalized Zero-Shot LearningPOSZero Shot Skeletal Action RecognitionGeneralized Zero Shot skeletal action recognitionAction RecognitionZero-Shot Learning
PaperPDFCode(official)

Abstract

We introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. Our design choices enable SynSE to generalize compositionally, i.e., recognize sequences whose action descriptions contain words not encountered during training. We also extend our approach to the more challenging Generalized Zero-Shot Learning (GZSL) problem via a confidence-based gating mechanism. We are the first to present zero-shot skeleton action recognition results on the large-scale NTU-60 and NTU-120 skeleton action datasets with multiple splits. Our results demonstrate SynSE's state of the art performance in both ZSL and GZSL settings compared to strong baselines on the NTU-60 and NTU-120 datasets. The code and pretrained models are available at https://github.com/skelemoa/synse-zsl

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
VideoNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
VideoPKU-MMDRandom Split Accuracy53.85SynSE
VideoNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
VideoNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
VideoNTU RGB+DRandom Split Accuracy64.19SynSE
VideoNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
VideoNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
VideoNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
VideoNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE
Temporal Action LocalizationNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
Temporal Action LocalizationNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
Temporal Action LocalizationPKU-MMDRandom Split Accuracy53.85SynSE
Temporal Action LocalizationNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
Temporal Action LocalizationNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
Temporal Action LocalizationNTU RGB+DRandom Split Accuracy64.19SynSE
Temporal Action LocalizationNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
Temporal Action LocalizationNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
Temporal Action LocalizationNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
Temporal Action LocalizationNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE
Zero-Shot LearningNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
Zero-Shot LearningNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
Zero-Shot LearningPKU-MMDRandom Split Accuracy53.85SynSE
Zero-Shot LearningNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
Zero-Shot LearningNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
Zero-Shot LearningNTU RGB+DRandom Split Accuracy64.19SynSE
Zero-Shot LearningNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
Zero-Shot LearningNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
Zero-Shot LearningNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
Zero-Shot LearningNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE
Activity RecognitionNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
Activity RecognitionNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
Activity RecognitionPKU-MMDRandom Split Accuracy53.85SynSE
Activity RecognitionNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
Activity RecognitionNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
Activity RecognitionNTU RGB+DRandom Split Accuracy64.19SynSE
Activity RecognitionNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
Activity RecognitionNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
Activity RecognitionNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
Activity RecognitionNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE
Action LocalizationNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
Action LocalizationNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
Action LocalizationPKU-MMDRandom Split Accuracy53.85SynSE
Action LocalizationNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
Action LocalizationNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
Action LocalizationNTU RGB+DRandom Split Accuracy64.19SynSE
Action LocalizationNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
Action LocalizationNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
Action LocalizationNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
Action LocalizationNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE
3D Action RecognitionNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
3D Action RecognitionNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
3D Action RecognitionPKU-MMDRandom Split Accuracy53.85SynSE
3D Action RecognitionNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
3D Action RecognitionNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
3D Action RecognitionNTU RGB+DRandom Split Accuracy64.19SynSE
3D Action RecognitionNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
3D Action RecognitionNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
3D Action RecognitionNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
3D Action RecognitionNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE
Action RecognitionNTU RGB+D 120Accuracy (10 unseen classes)62.69SynSE
Action RecognitionNTU RGB+D 120Accuracy (24 unseen classes)38.7SynSE
Action RecognitionPKU-MMDRandom Split Accuracy53.85SynSE
Action RecognitionNTU RGB+DAccuracy (12 unseen classes)33.3SynSE
Action RecognitionNTU RGB+DAccuracy (5 unseen classes)75.81SynSE
Action RecognitionNTU RGB+DRandom Split Accuracy64.19SynSE
Action RecognitionNTU RGB+DHarmonic Mean (12 unseen classes)36.33SynSE
Action RecognitionNTU RGB+DHarmonic Mean (5 unseen classes)59.02SynSE
Action RecognitionNTU RGB+D 120Harmonic Mean (10 unseen classes)54.94SynSE
Action RecognitionNTU RGB+D 120Harmonic Mean (24 unseen classes)41.04SynSE

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning2025-06-26Zero-Shot Learning for Obsolescence Risk Forecasting2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25