TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CHASE: Learning Convex Hull Adaptive Shift for Skeleton-ba...

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

Yuhang Wen, Mengyuan Liu, Songtao Wu, Beichen Ding

2024-10-093D Action RecognitionSkeleton Based Action RecognitionGroup Activity RecognitionAction RecognitionHuman Interaction Recognition
PaperPDFCode(official)

Abstract

Skeleton-based multi-entity action recognition is a challenging task aiming to identify interactive actions or group activities involving multiple diverse entities. Existing models for individuals often fall short in this task due to the inherent distribution discrepancies among entity skeletons, leading to suboptimal backbone optimization. To this end, we introduce a Convex Hull Adaptive Shift based multi-Entity action recognition method (CHASE), which mitigates inter-entity distribution gaps and unbiases subsequent backbones. Specifically, CHASE comprises a learnable parameterized network and an auxiliary objective. The parameterized network achieves plausible, sample-adaptive repositioning of skeleton sequences through two key components. First, the Implicit Convex Hull Constrained Adaptive Shift ensures that the new origin of the coordinate system is within the skeleton convex hull. Second, the Coefficient Learning Block provides a lightweight parameterization of the mapping from skeleton sequences to their specific coefficients in convex combinations. Moreover, to guide the optimization of this network for discrepancy minimization, we propose the Mini-batch Pair-wise Maximum Mean Discrepancy as the additional objective. CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies, thereby reducing data bias and improving the subsequent classifier's multi-entity action recognition performance. Extensive experiments on six datasets, including NTU Mutual 11/26, H2O, Assembly101, Collective Activity and Volleyball, consistently verify our approach by seamlessly adapting to single-entity backbones and boosting their performance in multi-entity scenarios. Our code is publicly available at https://github.com/Necolizer/CHASE .

Results

TaskDatasetMetricValueModel
VideoAssembly101Actions Top-128.03CHASE(CTR-GCN)
VideoH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Temporal Action LocalizationAssembly101Actions Top-128.03CHASE(CTR-GCN)
Temporal Action LocalizationH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Zero-Shot LearningAssembly101Actions Top-128.03CHASE(CTR-GCN)
Zero-Shot LearningH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Activity RecognitionAssembly101Actions Top-128.03CHASE(CTR-GCN)
Activity RecognitionH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Activity RecognitionCollective ActivityAccuracy89.61CHASE(CTR-GCN)
Activity RecognitionVolleyballAccuracy92.89CHASE(CTR-GCN)
Action LocalizationAssembly101Actions Top-128.03CHASE(CTR-GCN)
Action LocalizationH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Action DetectionH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Human Interaction RecognitionNTU RGB+DAccuracy (Cross-Subject)96.5CHASE(CTR-GCN)
Human Interaction RecognitionNTU RGB+DAccuracy (Cross-View)98.8CHASE(CTR-GCN)
Human Interaction RecognitionNTU RGB+D 120Accuracy (Cross-Setup)92.3CHASE(CTR-GCN)
Human Interaction RecognitionNTU RGB+D 120Accuracy (Cross-Subject)91.3CHASE(CTR-GCN)
3D Action RecognitionAssembly101Actions Top-128.03CHASE(CTR-GCN)
3D Action RecognitionH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)
Action RecognitionAssembly101Actions Top-128.03CHASE(CTR-GCN)
Action RecognitionH2O (2 Hands and Objects)Accuracy94.77CHASE(STSA-Net)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16