TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/WHAM: Reconstructing World-grounded Humans with Accurate 3...

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black

2023-12-12CVPR 2024 13D Human Pose Estimation
PaperPDFCode

Abstract

The estimation of 3D human motion from video has progressed rapidly but current methods still have several key limitations. First, most methods estimate the human in camera coordinates. Second, prior work on estimating humans in global coordinates often assumes a flat ground plane and produces foot sliding. Third, the most accurate methods rely on computationally expensive optimization pipelines, limiting their use to offline applications. Finally, existing video-based methods are surprisingly less accurate than single-frame methods. We address these limitations with WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video. WHAM learns to lift 2D keypoint sequences to 3D using motion capture data and fuses this with video features, integrating motion context and visual information. WHAM exploits camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. We combine this with a contact-aware trajectory refinement method that lets WHAM capture human motion in diverse conditions, such as climbing stairs. WHAM outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks. Code will be available for research purposes at http://wham.is.tue.mpg.de/

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationEMDBAverage MPJPE (mm)79.7WHAM (ViT)
3D Human Pose EstimationEMDBAverage MPJPE-PA (mm)50.4WHAM (ViT)
3D Human Pose EstimationEMDBAverage MVE (mm)94.4WHAM (ViT)
3D Human Pose Estimation3DPWMPJPE57.8WHAM (ViT)
3D Human Pose Estimation3DPWMPVPE68.7WHAM (ViT)
3D Human Pose Estimation3DPWPA-MPJPE35.9WHAM (ViT)
3D Human Pose EstimationRICHMPJPE80WHAM (ViT)
3D Human Pose EstimationRICHMPVPE91.2WHAM (ViT)
3D Human Pose EstimationRICHPA-MPJPE44.3WHAM (ViT)
Pose EstimationEMDBAverage MPJPE (mm)79.7WHAM (ViT)
Pose EstimationEMDBAverage MPJPE-PA (mm)50.4WHAM (ViT)
Pose EstimationEMDBAverage MVE (mm)94.4WHAM (ViT)
Pose Estimation3DPWMPJPE57.8WHAM (ViT)
Pose Estimation3DPWMPVPE68.7WHAM (ViT)
Pose Estimation3DPWPA-MPJPE35.9WHAM (ViT)
Pose EstimationRICHMPJPE80WHAM (ViT)
Pose EstimationRICHMPVPE91.2WHAM (ViT)
Pose EstimationRICHPA-MPJPE44.3WHAM (ViT)
3DEMDBAverage MPJPE (mm)79.7WHAM (ViT)
3DEMDBAverage MPJPE-PA (mm)50.4WHAM (ViT)
3DEMDBAverage MVE (mm)94.4WHAM (ViT)
3D3DPWMPJPE57.8WHAM (ViT)
3D3DPWMPVPE68.7WHAM (ViT)
3D3DPWPA-MPJPE35.9WHAM (ViT)
3DRICHMPJPE80WHAM (ViT)
3DRICHMPVPE91.2WHAM (ViT)
3DRICHPA-MPJPE44.3WHAM (ViT)
1 Image, 2*2 StitchiEMDBAverage MPJPE (mm)79.7WHAM (ViT)
1 Image, 2*2 StitchiEMDBAverage MPJPE-PA (mm)50.4WHAM (ViT)
1 Image, 2*2 StitchiEMDBAverage MVE (mm)94.4WHAM (ViT)
1 Image, 2*2 Stitchi3DPWMPJPE57.8WHAM (ViT)
1 Image, 2*2 Stitchi3DPWMPVPE68.7WHAM (ViT)
1 Image, 2*2 Stitchi3DPWPA-MPJPE35.9WHAM (ViT)
1 Image, 2*2 StitchiRICHMPJPE80WHAM (ViT)
1 Image, 2*2 StitchiRICHMPVPE91.2WHAM (ViT)
1 Image, 2*2 StitchiRICHPA-MPJPE44.3WHAM (ViT)

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction2025-05-20PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation2025-05-16HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation2025-05-07Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation2025-05-04