WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black

2023-12-12CVPR 2024 13D Human Pose Estimation

Abstract

The estimation of 3D human motion from video has progressed rapidly but current methods still have several key limitations. First, most methods estimate the human in camera coordinates. Second, prior work on estimating humans in global coordinates often assumes a flat ground plane and produces foot sliding. Third, the most accurate methods rely on computationally expensive optimization pipelines, limiting their use to offline applications. Finally, existing video-based methods are surprisingly less accurate than single-frame methods. We address these limitations with WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video. WHAM learns to lift 2D keypoint sequences to 3D using motion capture data and fuses this with video features, integrating motion context and visual information. WHAM exploits camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. We combine this with a contact-aware trajectory refinement method that lets WHAM capture human motion in diverse conditions, such as climbing stairs. WHAM outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks. Code will be available for research purposes at http://wham.is.tue.mpg.de/

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
3D Human Pose Estimation	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
3D Human Pose Estimation	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
3D Human Pose Estimation	3DPW	MPJPE	57.8	WHAM (ViT)
3D Human Pose Estimation	3DPW	MPVPE	68.7	WHAM (ViT)
3D Human Pose Estimation	3DPW	PA-MPJPE	35.9	WHAM (ViT)
3D Human Pose Estimation	RICH	MPJPE	80	WHAM (ViT)
3D Human Pose Estimation	RICH	MPVPE	91.2	WHAM (ViT)
3D Human Pose Estimation	RICH	PA-MPJPE	44.3	WHAM (ViT)
Pose Estimation	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
Pose Estimation	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
Pose Estimation	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
Pose Estimation	3DPW	MPJPE	57.8	WHAM (ViT)
Pose Estimation	3DPW	MPVPE	68.7	WHAM (ViT)
Pose Estimation	3DPW	PA-MPJPE	35.9	WHAM (ViT)
Pose Estimation	RICH	MPJPE	80	WHAM (ViT)
Pose Estimation	RICH	MPVPE	91.2	WHAM (ViT)
Pose Estimation	RICH	PA-MPJPE	44.3	WHAM (ViT)
3D	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
3D	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
3D	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
3D	3DPW	MPJPE	57.8	WHAM (ViT)
3D	3DPW	MPVPE	68.7	WHAM (ViT)
3D	3DPW	PA-MPJPE	35.9	WHAM (ViT)
3D	RICH	MPJPE	80	WHAM (ViT)
3D	RICH	MPVPE	91.2	WHAM (ViT)
3D	RICH	PA-MPJPE	44.3	WHAM (ViT)
1 Image, 2*2 Stitchi	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
1 Image, 2*2 Stitchi	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
1 Image, 2*2 Stitchi	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
1 Image, 2*2 Stitchi	3DPW	MPJPE	57.8	WHAM (ViT)
1 Image, 2*2 Stitchi	3DPW	MPVPE	68.7	WHAM (ViT)
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	35.9	WHAM (ViT)
1 Image, 2*2 Stitchi	RICH	MPJPE	80	WHAM (ViT)
1 Image, 2*2 Stitchi	RICH	MPVPE	91.2	WHAM (ViT)
1 Image, 2*2 Stitchi	RICH	PA-MPJPE	44.3	WHAM (ViT)

Abstract

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
3D Human Pose Estimation	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
3D Human Pose Estimation	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
3D Human Pose Estimation	3DPW	MPJPE	57.8	WHAM (ViT)
3D Human Pose Estimation	3DPW	MPVPE	68.7	WHAM (ViT)
3D Human Pose Estimation	3DPW	PA-MPJPE	35.9	WHAM (ViT)
3D Human Pose Estimation	RICH	MPJPE	80	WHAM (ViT)
3D Human Pose Estimation	RICH	MPVPE	91.2	WHAM (ViT)
3D Human Pose Estimation	RICH	PA-MPJPE	44.3	WHAM (ViT)
Pose Estimation	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
Pose Estimation	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
Pose Estimation	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
Pose Estimation	3DPW	MPJPE	57.8	WHAM (ViT)
Pose Estimation	3DPW	MPVPE	68.7	WHAM (ViT)
Pose Estimation	3DPW	PA-MPJPE	35.9	WHAM (ViT)
Pose Estimation	RICH	MPJPE	80	WHAM (ViT)
Pose Estimation	RICH	MPVPE	91.2	WHAM (ViT)
Pose Estimation	RICH	PA-MPJPE	44.3	WHAM (ViT)
3D	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
3D	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
3D	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
3D	3DPW	MPJPE	57.8	WHAM (ViT)
3D	3DPW	MPVPE	68.7	WHAM (ViT)
3D	3DPW	PA-MPJPE	35.9	WHAM (ViT)
3D	RICH	MPJPE	80	WHAM (ViT)
3D	RICH	MPVPE	91.2	WHAM (ViT)
3D	RICH	PA-MPJPE	44.3	WHAM (ViT)
1 Image, 2*2 Stitchi	EMDB	Average MPJPE (mm)	79.7	WHAM (ViT)
1 Image, 2*2 Stitchi	EMDB	Average MPJPE-PA (mm)	50.4	WHAM (ViT)
1 Image, 2*2 Stitchi	EMDB	Average MVE (mm)	94.4	WHAM (ViT)
1 Image, 2*2 Stitchi	3DPW	MPJPE	57.8	WHAM (ViT)
1 Image, 2*2 Stitchi	3DPW	MPVPE	68.7	WHAM (ViT)
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	35.9	WHAM (ViT)
1 Image, 2*2 Stitchi	RICH	MPJPE	80	WHAM (ViT)
1 Image, 2*2 Stitchi	RICH	MPVPE	91.2	WHAM (ViT)
1 Image, 2*2 Stitchi	RICH	PA-MPJPE	44.3	WHAM (ViT)

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Abstract

Results

Related Papers

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Abstract

Results

Related Papers