TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skel...

Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition

Jiahang Zhang, Lilang Lin, Jiaying Liu

2024-07-17Skeleton Based Action RecognitionData AugmentationSaliency PredictionAction Recognition
PaperPDFCode(official)

Abstract

In real-world scenarios, human actions often fall into a long-tailed distribution. It makes the existing skeleton-based action recognition works, which are mostly designed based on balanced datasets, suffer from a sharp performance degradation. Recently, many efforts have been madeto image/video long-tailed learning. However, directly applying them to skeleton data can be sub-optimal due to the lack of consideration of the crucial spatial-temporal motion patterns, especially for some modality-specific methodologies such as data augmentation. To this end, considering the crucial role of the body parts in the spatially concentrated human actions, we attend to the mixing augmentations and propose a novel method, Shap-Mix, which improves long-tailed learning by mining representative motion patterns for tail categories. Specifically, we first develop an effective spatial-temporal mixing strategy for the skeleton to boost representation quality. Then, the employed saliency guidance method is presented, consisting of the saliency estimation based on Shapley value and a tail-aware mixing policy. It preserves the salient motion parts of minority classes in mixed data, explicitly establishing the relationships between crucial body structure cues and high-level semantics. Extensive experiments on three large-scale skeleton datasets show our remarkable performance improvement under both long-tailed and balanced settings. Our project is publicly available at: https://jhang2020.github.io/Projects/Shap-Mix/Shap-Mix.html.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
VideoNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
VideoNTU RGB+D 120Ensembled Modalities4Shap-Mix
VideoNTU RGB+DAccuracy (CS)93.7Shap-Mix
VideoNTU RGB+DAccuracy (CV)97.1Shap-Mix
VideoNTU RGB+DEnsembled Modalities4Shap-Mix
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
Temporal Action LocalizationNTU RGB+D 120Ensembled Modalities4Shap-Mix
Temporal Action LocalizationNTU RGB+DAccuracy (CS)93.7Shap-Mix
Temporal Action LocalizationNTU RGB+DAccuracy (CV)97.1Shap-Mix
Temporal Action LocalizationNTU RGB+DEnsembled Modalities4Shap-Mix
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
Zero-Shot LearningNTU RGB+D 120Ensembled Modalities4Shap-Mix
Zero-Shot LearningNTU RGB+DAccuracy (CS)93.7Shap-Mix
Zero-Shot LearningNTU RGB+DAccuracy (CV)97.1Shap-Mix
Zero-Shot LearningNTU RGB+DEnsembled Modalities4Shap-Mix
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
Activity RecognitionNTU RGB+D 120Ensembled Modalities4Shap-Mix
Activity RecognitionNTU RGB+DAccuracy (CS)93.7Shap-Mix
Activity RecognitionNTU RGB+DAccuracy (CV)97.1Shap-Mix
Activity RecognitionNTU RGB+DEnsembled Modalities4Shap-Mix
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
Action LocalizationNTU RGB+D 120Ensembled Modalities4Shap-Mix
Action LocalizationNTU RGB+DAccuracy (CS)93.7Shap-Mix
Action LocalizationNTU RGB+DAccuracy (CV)97.1Shap-Mix
Action LocalizationNTU RGB+DEnsembled Modalities4Shap-Mix
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
Action DetectionNTU RGB+D 120Ensembled Modalities4Shap-Mix
Action DetectionNTU RGB+DAccuracy (CS)93.7Shap-Mix
Action DetectionNTU RGB+DAccuracy (CV)97.1Shap-Mix
Action DetectionNTU RGB+DEnsembled Modalities4Shap-Mix
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
3D Action RecognitionNTU RGB+D 120Ensembled Modalities4Shap-Mix
3D Action RecognitionNTU RGB+DAccuracy (CS)93.7Shap-Mix
3D Action RecognitionNTU RGB+DAccuracy (CV)97.1Shap-Mix
3D Action RecognitionNTU RGB+DEnsembled Modalities4Shap-Mix
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91.7Shap-Mix
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)90.4Shap-Mix
Action RecognitionNTU RGB+D 120Ensembled Modalities4Shap-Mix
Action RecognitionNTU RGB+DAccuracy (CS)93.7Shap-Mix
Action RecognitionNTU RGB+DAccuracy (CV)97.1Shap-Mix
Action RecognitionNTU RGB+DEnsembled Modalities4Shap-Mix

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation2025-07-11