TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/USDRL: Unified Skeleton-Based Dense Representation Learnin...

USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

Wanjiang Weng, Hongsong Wang, JunBo Wang, Lei He, GuoSen Xie

2024-12-12Action DetectionRepresentation LearningSkeleton Based Action RecognitionContrastive LearningAction RecognitionRetrieval
PaperPDFCode(official)

Abstract

Contrastive learning has achieved great success in skeleton-based representation learning recently. However, the prevailing methods are predominantly negative-based, necessitating additional momentum encoder and memory bank to get negative samples, which increases the difficulty of model training. Furthermore, these methods primarily concentrate on learning a global representation for recognition and retrieval tasks, while overlooking the rich and detailed local representations that are crucial for dense prediction tasks. To alleviate these issues, we introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation, called USDRL, which employs feature decorrelation across temporal, spatial, and instance domains in a multi-grained manner to reduce redundancy among dimensions of the representations to maximize information extraction from features. Additionally, we design a Dense Spatio-Temporal Encoder (DSTE) to capture fine-grained action representations effectively, thereby enhancing the performance of dense prediction tasks. Comprehensive experiments, conducted on the benchmarks NTU-60, NTU-120, PKU-MMD I, and PKU-MMD II, across diverse downstream tasks including action recognition, action retrieval, and action detection, conclusively demonstrate that our approach significantly outperforms the current state-of-the-art (SOTA) approaches. Our code and models are available at https://github.com/wengwanjiang/USDRL.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
VideoNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
VideoNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
VideoNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
Temporal Action LocalizationNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
Temporal Action LocalizationNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
Zero-Shot LearningNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
Zero-Shot LearningNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
Activity RecognitionNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
Activity RecognitionNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
Action LocalizationNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
Action LocalizationNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
Action DetectionNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
Action DetectionNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
3D Action RecognitionNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
3D Action RecognitionNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)79.33s-USDRL (DSTE) This work
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)80.63s-USDRL (DSTE) This work
Action RecognitionNTU RGB+DAccuracy (CS)87.13s-USDRL (DSTE) This work
Action RecognitionNTU RGB+DAccuracy (CV)93.23s-USDRL (DSTE) This work

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17