TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A simple yet effective baseline for 3d human pose estimation

A simple yet effective baseline for 3d human pose estimation

Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little

2017-05-08ICCV 2017 103D Human Pose EstimationMonocular 3D Human Pose EstimationPose Estimation3D Pose Estimation
PaperPDFCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-dimensional positions. With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feed-forward network outperforms the best reported result by about 30\% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (\ie, using images as input) yields state of the art results -- this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationHumanEva-IMean Reconstruction Error (mm)24.6SIM (SH detections)
3D Human Pose Estimation3DPWPA-MPJPE157Simple-baseline
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
3D Human Pose EstimationHuman3.6MFrames Needed1SIM (SH detections FT) (MA)
Pose EstimationHumanEva-IMean Reconstruction Error (mm)24.6SIM (SH detections)
Pose Estimation3DPWPA-MPJPE157Simple-baseline
Pose EstimationHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
Pose EstimationHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
Pose EstimationHuman3.6MFrames Needed1SIM (SH detections FT) (MA)
3DHumanEva-IMean Reconstruction Error (mm)24.6SIM (SH detections)
3D3DPWPA-MPJPE157Simple-baseline
3DHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
3DHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
3DHuman3.6MFrames Needed1SIM (SH detections FT) (MA)
1 Image, 2*2 StitchiHumanEva-IMean Reconstruction Error (mm)24.6SIM (SH detections)
1 Image, 2*2 Stitchi3DPWPA-MPJPE157Simple-baseline
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)62.9SIM (SH detections FT) (MA)
1 Image, 2*2 StitchiHuman3.6MFrames Needed1SIM (SH detections FT) (MA)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16