A simple yet effective baseline for 3d human pose estimation

Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little

2017-05-08ICCV 2017 103D Human Pose Estimation Monocular 3D Human Pose Estimation Pose Estimation 3D Pose Estimation

Paper PDF Code Code Code Code(official)Code Code Code Code Code Code Code Code Code Code

Abstract

Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-dimensional positions. With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feed-forward network outperforms the best reported result by about 30\% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (\ie, using images as input) yields state of the art results -- this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	24.6	SIM (SH detections)
3D Human Pose Estimation	3DPW	PA-MPJPE	157	Simple-baseline
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
3D Human Pose Estimation	Human3.6M	Frames Needed	1	SIM (SH detections FT) (MA)
Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	24.6	SIM (SH detections)
Pose Estimation	3DPW	PA-MPJPE	157	Simple-baseline
Pose Estimation	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
Pose Estimation	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
Pose Estimation	Human3.6M	Frames Needed	1	SIM (SH detections FT) (MA)
3D	HumanEva-I	Mean Reconstruction Error (mm)	24.6	SIM (SH detections)
3D	3DPW	PA-MPJPE	157	Simple-baseline
3D	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
3D	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
3D	Human3.6M	Frames Needed	1	SIM (SH detections FT) (MA)
1 Image, 2*2 Stitchi	HumanEva-I	Mean Reconstruction Error (mm)	24.6	SIM (SH detections)
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	157	Simple-baseline
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	62.9	SIM (SH detections FT) (MA)
1 Image, 2*2 Stitchi	Human3.6M	Frames Needed	1	SIM (SH detections FT) (MA)

A simple yet effective baseline for 3d human pose estimation

Abstract

Results

Related Papers

A simple yet effective baseline for 3d human pose estimation

Abstract

Results

Related Papers