TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Modeling Temporal Dynamics and Spatial Configurations of A...

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Hongsong Wang, Liang Wang

2017-04-09CVPR 2017 73D Action RecognitionSkeleton Based Action RecognitionData AugmentationAction RecognitionTemporal Action Localization
PaperPDF

Abstract

Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
VideoNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
Zero-Shot LearningNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
Zero-Shot LearningNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
Activity RecognitionNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
Activity RecognitionNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
Action LocalizationNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
Action LocalizationNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
Action DetectionNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
Action DetectionNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
3D Action RecognitionNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
3D Action RecognitionNTU RGB+DAccuracy (CV)79.5Two-Stream RNN
Action RecognitionNTU RGB+DAccuracy (CS)71.3Two-Stream RNN
Action RecognitionNTU RGB+DAccuracy (CV)79.5Two-Stream RNN

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13