TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Human Pose Models from Synthesized Data for Robus...

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Jian Liu, Naveed Akhtar, Ajmal Mian

2017-07-04Skeleton Based Action RecognitionAction RecognitionTemporal Action Localization
PaperPDF

Abstract

We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the training data. First, we learn representative human poses from a large corpus of real motion captured human skeleton data. Next, we fit synthetic 3D humans with different body shapes to each pose and render each from 180 camera viewpoints while randomly varying the clothing textures, background and lighting. Generative Adversarial Networks are employed to minimize the gap between synthetic and real image distributions. CNN models are then learned that transfer human poses to a shared high-level invariant space. The learned CNN models are then used as invariant feature extractors from real RGB and depth frames of human action videos and the temporal variations are modelled by Fourier Temporal Pyramid. Finally, linear SVM is used for classification. Experiments on three benchmark cross-view human action datasets show that our algorithm outperforms existing methods by significant margins for RGB only and RGB-D action recognition.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
VideoNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
Temporal Action LocalizationNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
Temporal Action LocalizationNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
Zero-Shot LearningNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
Zero-Shot LearningNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
Activity RecognitionNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
Activity RecognitionNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
Action LocalizationNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
Action LocalizationNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
Action DetectionNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
Action DetectionNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
3D Action RecognitionNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
3D Action RecognitionNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj
Action RecognitionNTU RGB+DAccuracy (CS)80.9HPM_RGB+HPM_3D+Traj
Action RecognitionNTU RGB+DAccuracy (CV)86.1HPM_RGB+HPM_3D+Traj

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22