TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3D Human Reconstruction in the Wild with Synthetic Data Us...

3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models

Yongtao Ge, Wenjia Wang, Yongfan Chen, Hao Chen, Chunhua Shen

2024-03-173D Human Pose Estimation3D human pose and shape estimation3D Human Reconstruction
PaperPDF

Abstract

In this work, we show that synthetic data created by generative models is complementary to computer graphics (CG) rendered data for achieving remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation (HPS). Specifically, we propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. We first collect a large-scale human-centric dataset with comprehensive annotations, e.g., text captions and surface normal images. Then, we train a customized ControlNet model upon this dataset to generate diverse human images and initial ground-truth labels. At the core of this step is that we can easily obtain numerous surface normal images from a 3D human parametric model, e.g., SMPL-X, by rendering the 3D mesh onto the image plane. As there exists inevitable noise in the initial labels, we then apply an off-the-shelf foundation segmentation model, i.e., SAM, to filter negative data samples. Our data generation pipeline is flexible and customizable to facilitate different real-world tasks, e.g., ego-centric scenes and perspective-distortion scenes. The generated dataset comprises 0.79M images with corresponding 3D annotations, covering versatile viewpoints, scenes, and human identities. We train various HPS regressors on top of the generated data and evaluate them on a wide range of benchmarks (3DPW, RICH, EgoBody, AGORA, SSP-3D) to verify the effectiveness of the generated data. By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.

Results

TaskDatasetMetricValueModel
3D Human Pose Estimation3DPWMPJPE65.2CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
3D Human Pose Estimation3DPWMPVPE76.8CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
3D Human Pose Estimation3DPWPA-MPJPE41.9CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
3D Human Pose Estimation3DPWMPJPE87.3CLIFF
3D Human Pose Estimation3DPWMPVPE102.1CLIFF
3D Human Pose Estimation3DPWPA-MPJPE52.7CLIFF
Pose Estimation3DPWMPJPE65.2CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
Pose Estimation3DPWMPVPE76.8CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
Pose Estimation3DPWPA-MPJPE41.9CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
Pose Estimation3DPWMPJPE87.3CLIFF
Pose Estimation3DPWMPVPE102.1CLIFF
Pose Estimation3DPWPA-MPJPE52.7CLIFF
3D3DPWMPJPE65.2CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
3D3DPWMPVPE76.8CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
3D3DPWPA-MPJPE41.9CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
3D3DPWMPJPE87.3CLIFF
3D3DPWMPVPE102.1CLIFF
3D3DPWPA-MPJPE52.7CLIFF
1 Image, 2*2 Stitchi3DPWMPJPE65.2CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
1 Image, 2*2 Stitchi3DPWMPVPE76.8CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
1 Image, 2*2 Stitchi3DPWPA-MPJPE41.9CLIFF (3DPW+HumanWild+BEDLAM+AGORA)
1 Image, 2*2 Stitchi3DPWMPJPE87.3CLIFF
1 Image, 2*2 Stitchi3DPWMPVPE102.1CLIFF
1 Image, 2*2 Stitchi3DPWPA-MPJPE52.7CLIFF

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images2025-06-16SMPL Normal Map Is All You Need for Single-view Textured Human Reconstruction2025-06-15Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers2025-06-03UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction2025-05-20