TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Action Recognition with Multi-stream Motion Modeling and M...

Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang, Shuang Wu, Zhibo Wang, Kui Ren

2023-06-13Skeleton Based Action RecognitionAction Recognition
PaperPDF

Abstract

Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. Our code is anonymously released at https://github.com/ActionR-Group/Stream-GCN, hoping to inspire the community.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
VideoNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
VideoNTU RGB+DAccuracy (CS)92.9Stream-GCN
VideoNTU RGB+DAccuracy (CV)96.9Stream-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
Temporal Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)92.9Stream-GCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96.9Stream-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
Zero-Shot LearningNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)92.9Stream-GCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)96.9Stream-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
Activity RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
Activity RecognitionNTU RGB+DAccuracy (CS)92.9Stream-GCN
Activity RecognitionNTU RGB+DAccuracy (CV)96.9Stream-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
Action LocalizationNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
Action LocalizationNTU RGB+DAccuracy (CS)92.9Stream-GCN
Action LocalizationNTU RGB+DAccuracy (CV)96.9Stream-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
Action DetectionNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
Action DetectionNTU RGB+DAccuracy (CS)92.9Stream-GCN
Action DetectionNTU RGB+DAccuracy (CV)96.9Stream-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
3D Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
3D Action RecognitionNTU RGB+DAccuracy (CS)92.9Stream-GCN
3D Action RecognitionNTU RGB+DAccuracy (CV)96.9Stream-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Setup)91Stream-GCN
Action RecognitionNTU RGB+D 120Accuracy (Cross-Subject)89.7Stream-GCN
Action RecognitionNTU RGB+DAccuracy (CS)92.9Stream-GCN
Action RecognitionNTU RGB+DAccuracy (CV)96.9Stream-GCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16