TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Zero-shot Skeleton-based Action Recognition via Mutual Inf...

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization

Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su, Jiaqi Wang

2023-08-07Skeleton Based Action RecognitionZero Shot Skeletal Action RecognitionAction Recognition
PaperPDFCode(official)

Abstract

Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories. The key is to build the connection between visual and semantic space from seen to unseen classes. Previous studies have primarily focused on encoding sequences into a singular feature vector, with subsequent mapping the features to an identical anchor point within the embedded space. Their performance is hindered by 1) the ignorance of the global visual/semantic distribution alignment, which results in a limitation to capture the true interdependence between the two spaces. 2) the negligence of temporal information since the frame-wise features with rich action clues are directly pooled into a single feature vector. We propose a new zero-shot skeleton-based action recognition method via mutual information (MI) estimation and maximization. Specifically, 1) we maximize the MI between visual and semantic space for distribution alignment; 2) we leverage the temporal information for estimating the MI by encouraging MI to increase as more frames are observed. Extensive experiments on three large-scale skeleton action datasets confirm the effectiveness of our method. Code: https://github.com/YujieOuO/SMIE.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
VideoNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
VideoNTU RGB+D 120Random Split Accuracy46.4SMIE
VideoPKU-MMDRandom Split Accuracy60.83SMIE
VideoNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
VideoNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
VideoNTU RGB+DRandom Split Accuracy65.08SMIE
Temporal Action LocalizationNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
Temporal Action LocalizationNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
Temporal Action LocalizationNTU RGB+D 120Random Split Accuracy46.4SMIE
Temporal Action LocalizationPKU-MMDRandom Split Accuracy60.83SMIE
Temporal Action LocalizationNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
Temporal Action LocalizationNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
Temporal Action LocalizationNTU RGB+DRandom Split Accuracy65.08SMIE
Zero-Shot LearningNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
Zero-Shot LearningNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
Zero-Shot LearningNTU RGB+D 120Random Split Accuracy46.4SMIE
Zero-Shot LearningPKU-MMDRandom Split Accuracy60.83SMIE
Zero-Shot LearningNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
Zero-Shot LearningNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
Zero-Shot LearningNTU RGB+DRandom Split Accuracy65.08SMIE
Activity RecognitionNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
Activity RecognitionNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
Activity RecognitionNTU RGB+D 120Random Split Accuracy46.4SMIE
Activity RecognitionPKU-MMDRandom Split Accuracy60.83SMIE
Activity RecognitionNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
Activity RecognitionNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
Activity RecognitionNTU RGB+DRandom Split Accuracy65.08SMIE
Action LocalizationNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
Action LocalizationNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
Action LocalizationNTU RGB+D 120Random Split Accuracy46.4SMIE
Action LocalizationPKU-MMDRandom Split Accuracy60.83SMIE
Action LocalizationNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
Action LocalizationNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
Action LocalizationNTU RGB+DRandom Split Accuracy65.08SMIE
3D Action RecognitionNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
3D Action RecognitionNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
3D Action RecognitionNTU RGB+D 120Random Split Accuracy46.4SMIE
3D Action RecognitionPKU-MMDRandom Split Accuracy60.83SMIE
3D Action RecognitionNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
3D Action RecognitionNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
3D Action RecognitionNTU RGB+DRandom Split Accuracy65.08SMIE
Action RecognitionNTU RGB+D 120Accuracy (10 unseen classes)65.74SMIE
Action RecognitionNTU RGB+D 120Accuracy (24 unseen classes)45.3SMIE
Action RecognitionNTU RGB+D 120Random Split Accuracy46.4SMIE
Action RecognitionPKU-MMDRandom Split Accuracy60.83SMIE
Action RecognitionNTU RGB+DAccuracy (12 unseen classes)40.18SMIE
Action RecognitionNTU RGB+DAccuracy (5 unseen classes)77.98SMIE
Action RecognitionNTU RGB+DRandom Split Accuracy65.08SMIE

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16