TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Co-occurrence Feature Learning from Skeleton Data for Acti...

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

Chao Li, Qiaoyong Zhong, Di Xie, ShiLiang Pu

2018-04-17Skeleton Based Action RecognitionAction RecognitionTemporal Action Localization
PaperPDFCodeCodeCodeCodeCode(official)Code

Abstract

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.

Results

TaskDatasetMetricValueModel
VideoPKU-MMDmAP@0.50 (CS)92.6HCN
VideoPKU-MMDmAP@0.50 (CV)94.2HCN
VideoNTU RGB+DAccuracy (CS)86.5HCN
VideoNTU RGB+DAccuracy (CV)91.1HCN
Temporal Action LocalizationPKU-MMDmAP@0.50 (CS)92.6HCN
Temporal Action LocalizationPKU-MMDmAP@0.50 (CV)94.2HCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)86.5HCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)91.1HCN
Zero-Shot LearningPKU-MMDmAP@0.50 (CS)92.6HCN
Zero-Shot LearningPKU-MMDmAP@0.50 (CV)94.2HCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)86.5HCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)91.1HCN
Activity RecognitionPKU-MMDmAP@0.50 (CS)92.6HCN
Activity RecognitionPKU-MMDmAP@0.50 (CV)94.2HCN
Activity RecognitionNTU RGB+DAccuracy (CS)86.5HCN
Activity RecognitionNTU RGB+DAccuracy (CV)91.1HCN
Action LocalizationPKU-MMDmAP@0.50 (CS)92.6HCN
Action LocalizationPKU-MMDmAP@0.50 (CV)94.2HCN
Action LocalizationNTU RGB+DAccuracy (CS)86.5HCN
Action LocalizationNTU RGB+DAccuracy (CV)91.1HCN
Pose Estimation RF-MMDmAP (@0.1, Through-wall)78.5HCN
Pose Estimation RF-MMDmAP (@0.1, Visible)825HCN
Action DetectionPKU-MMDmAP@0.50 (CS)92.6HCN
Action DetectionPKU-MMDmAP@0.50 (CV)94.2HCN
Action DetectionNTU RGB+DAccuracy (CS)86.5HCN
Action DetectionNTU RGB+DAccuracy (CV)91.1HCN
3D Action RecognitionPKU-MMDmAP@0.50 (CS)92.6HCN
3D Action RecognitionPKU-MMDmAP@0.50 (CV)94.2HCN
3D Action RecognitionNTU RGB+DAccuracy (CS)86.5HCN
3D Action RecognitionNTU RGB+DAccuracy (CV)91.1HCN
3D RF-MMDmAP (@0.1, Through-wall)78.5HCN
3D RF-MMDmAP (@0.1, Visible)825HCN
Action RecognitionPKU-MMDmAP@0.50 (CS)92.6HCN
Action RecognitionPKU-MMDmAP@0.50 (CV)94.2HCN
Action RecognitionNTU RGB+DAccuracy (CS)86.5HCN
Action RecognitionNTU RGB+DAccuracy (CV)91.1HCN
1 Image, 2*2 Stitchi RF-MMDmAP (@0.1, Through-wall)78.5HCN
1 Image, 2*2 Stitchi RF-MMDmAP (@0.1, Visible)825HCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22