Yue Zhu, Nermin Samet, David Picard
We present a benchmark for 3D human whole-body pose estimation, which involves identifying accurate 3D keypoints on the entire human body, including face, hands, body, and feet. Currently, the lack of a fully annotated and accurate 3D whole-body dataset results in deep networks being trained separately on specific body parts, which are combined during inference. Or they rely on pseudo-groundtruth provided by parametric body models which are not as accurate as detection based methods. To overcome these issues, we introduce the Human3.6M 3D WholeBody (H3WB) dataset, which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB comprises 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. We also propose three tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, and iii) 3D whole-body pose estimation from a single RGB image. Additionally, we report several baselines from popular methods for these tasks. Furthermore, we also provide automated 3D whole-body annotations of TotalCapture and experimentally show that when used with H3WB it helps to improve the performance. Code and dataset is available at https://github.com/wholebody3d/wholebody3d
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 14.6 | Large SimpleBaseline |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 17.8 | Jointformer |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 17.9 | CanonPose + 3D supervision |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 19.8 | Large SimpleBaseline |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 19.8 | Jointformer |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 20.7 | CPN + Jointformer |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 22.2 | CanonPose + 3D supervision |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 24.6 | CanonPose |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 24.6 | SimpleBaseline |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 26.3 | Resnet50 |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 31.9 | CanonPose |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 32.5 | SHN + SimpleBaseline |
| Facial Recognition and Modelling | H3WB | Average MPJPE (mm) | 34 | SimpleBaseline |
| 3D Human Pose Estimation | H3WB | MPJPE | 84.9 | Jointformer |
| 3D Human Pose Estimation | H3WB | MPJPE | 103 | Jointformer |
| 3D Human Pose Estimation | H3WB | MPJPE | 112.6 | Large SimpleBaseline |
| 3D Human Pose Estimation | H3WB | MPJPE | 117.5 | CanonPose + 3D supervision |
| 3D Human Pose Estimation | H3WB | MPJPE | 125.7 | SimpleBaseline |
| 3D Human Pose Estimation | H3WB | MPJPE | 131.6 | Large SimpleBaseline |
| 3D Human Pose Estimation | H3WB | MPJPE | 142.8 | CPN + Jointformer |
| 3D Human Pose Estimation | H3WB | MPJPE | 151.6 | Resnet50 |
| 3D Human Pose Estimation | H3WB | MPJPE | 155.9 | CanonPose + 3D supervision |
| 3D Human Pose Estimation | H3WB | MPJPE | 189.6 | SHN + SimpleBaseline |
| 3D Human Pose Estimation | H3WB | MPJPE | 193.7 | CanonPose |
| 3D Human Pose Estimation | H3WB | MPJPE | 252 | SimpleBaseline |
| 3D Human Pose Estimation | H3WB | MPJPE | 264.4 | CanonPose |
| Hand | H3WB | Average MPJPE (mm) | 31.7 | Large SimpleBaseline |
| Hand | H3WB | Average MPJPE (mm) | 38.3 | CanonPose + 3D supervision |
| Hand | H3WB | Average MPJPE (mm) | 42.5 | SimpleBaseline |
| Hand | H3WB | Average MPJPE (mm) | 43.7 | Jointformer |
| Hand | H3WB | Average MPJPE (mm) | 44.8 | Large SimpleBaseline |
| Hand | H3WB | Average MPJPE (mm) | 47.4 | CanonPose + 3D supervision |
| Hand | H3WB | Average MPJPE (mm) | 48.9 | CanonPose |
| Hand | H3WB | Average MPJPE (mm) | 53.5 | Jointformer |
| Hand | H3WB | Average MPJPE (mm) | 56.2 | CanonPose |
| Hand | H3WB | Average MPJPE (mm) | 56.9 | CPN + Jointformer |
| Hand | H3WB | Average MPJPE (mm) | 63.1 | Resnet50 |
| Hand | H3WB | Average MPJPE (mm) | 64.3 | SHN + SimpleBaseline |
| Hand | H3WB | Average MPJPE (mm) | 83.4 | SimpleBaseline |
| Pose Estimation | H3WB | MPJPE | 84.9 | Jointformer |
| Pose Estimation | H3WB | MPJPE | 103 | Jointformer |
| Pose Estimation | H3WB | MPJPE | 112.6 | Large SimpleBaseline |
| Pose Estimation | H3WB | MPJPE | 117.5 | CanonPose + 3D supervision |
| Pose Estimation | H3WB | MPJPE | 125.7 | SimpleBaseline |
| Pose Estimation | H3WB | MPJPE | 131.6 | Large SimpleBaseline |
| Pose Estimation | H3WB | MPJPE | 142.8 | CPN + Jointformer |
| Pose Estimation | H3WB | MPJPE | 151.6 | Resnet50 |
| Pose Estimation | H3WB | MPJPE | 155.9 | CanonPose + 3D supervision |
| Pose Estimation | H3WB | MPJPE | 189.6 | SHN + SimpleBaseline |
| Pose Estimation | H3WB | MPJPE | 193.7 | CanonPose |
| Pose Estimation | H3WB | MPJPE | 252 | SimpleBaseline |
| Pose Estimation | H3WB | MPJPE | 264.4 | CanonPose |
| Pose Estimation | H3WB | Average MPJPE (mm) | 31.7 | Large SimpleBaseline |
| Pose Estimation | H3WB | Average MPJPE (mm) | 38.3 | CanonPose + 3D supervision |
| Pose Estimation | H3WB | Average MPJPE (mm) | 42.5 | SimpleBaseline |
| Pose Estimation | H3WB | Average MPJPE (mm) | 43.7 | Jointformer |
| Pose Estimation | H3WB | Average MPJPE (mm) | 44.8 | Large SimpleBaseline |
| Pose Estimation | H3WB | Average MPJPE (mm) | 47.4 | CanonPose + 3D supervision |
| Pose Estimation | H3WB | Average MPJPE (mm) | 48.9 | CanonPose |
| Pose Estimation | H3WB | Average MPJPE (mm) | 53.5 | Jointformer |
| Pose Estimation | H3WB | Average MPJPE (mm) | 56.2 | CanonPose |
| Pose Estimation | H3WB | Average MPJPE (mm) | 56.9 | CPN + Jointformer |
| Pose Estimation | H3WB | Average MPJPE (mm) | 63.1 | Resnet50 |
| Pose Estimation | H3WB | Average MPJPE (mm) | 64.3 | SHN + SimpleBaseline |
| Pose Estimation | H3WB | Average MPJPE (mm) | 83.4 | SimpleBaseline |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 31.7 | Large SimpleBaseline |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 38.3 | CanonPose + 3D supervision |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 42.5 | SimpleBaseline |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 43.7 | Jointformer |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 44.8 | Large SimpleBaseline |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 47.4 | CanonPose + 3D supervision |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 48.9 | CanonPose |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 53.5 | Jointformer |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 56.2 | CanonPose |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 56.9 | CPN + Jointformer |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 63.1 | Resnet50 |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 64.3 | SHN + SimpleBaseline |
| Hand Pose Estimation | H3WB | Average MPJPE (mm) | 83.4 | SimpleBaseline |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 14.6 | Large SimpleBaseline |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 17.8 | Jointformer |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 17.9 | CanonPose + 3D supervision |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 19.8 | Large SimpleBaseline |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 19.8 | Jointformer |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 20.7 | CPN + Jointformer |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 22.2 | CanonPose + 3D supervision |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 24.6 | CanonPose |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 24.6 | SimpleBaseline |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 26.3 | Resnet50 |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 31.9 | CanonPose |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 32.5 | SHN + SimpleBaseline |
| Facial Landmark Detection | H3WB | Average MPJPE (mm) | 34 | SimpleBaseline |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 14.6 | Large SimpleBaseline |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 17.8 | Jointformer |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 17.9 | CanonPose + 3D supervision |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 19.8 | Large SimpleBaseline |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 19.8 | Jointformer |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 20.7 | CPN + Jointformer |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 22.2 | CanonPose + 3D supervision |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 24.6 | CanonPose |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 24.6 | SimpleBaseline |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 26.3 | Resnet50 |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 31.9 | CanonPose |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 32.5 | SHN + SimpleBaseline |
| Face Reconstruction | H3WB | Average MPJPE (mm) | 34 | SimpleBaseline |
| 3D | H3WB | MPJPE | 84.9 | Jointformer |
| 3D | H3WB | MPJPE | 103 | Jointformer |
| 3D | H3WB | MPJPE | 112.6 | Large SimpleBaseline |
| 3D | H3WB | MPJPE | 117.5 | CanonPose + 3D supervision |
| 3D | H3WB | MPJPE | 125.7 | SimpleBaseline |
| 3D | H3WB | MPJPE | 131.6 | Large SimpleBaseline |
| 3D | H3WB | MPJPE | 142.8 | CPN + Jointformer |
| 3D | H3WB | MPJPE | 151.6 | Resnet50 |
| 3D | H3WB | MPJPE | 155.9 | CanonPose + 3D supervision |
| 3D | H3WB | MPJPE | 189.6 | SHN + SimpleBaseline |
| 3D | H3WB | MPJPE | 193.7 | CanonPose |
| 3D | H3WB | MPJPE | 252 | SimpleBaseline |
| 3D | H3WB | MPJPE | 264.4 | CanonPose |
| 3D | H3WB | Average MPJPE (mm) | 31.7 | Large SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 38.3 | CanonPose + 3D supervision |
| 3D | H3WB | Average MPJPE (mm) | 42.5 | SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 43.7 | Jointformer |
| 3D | H3WB | Average MPJPE (mm) | 44.8 | Large SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 47.4 | CanonPose + 3D supervision |
| 3D | H3WB | Average MPJPE (mm) | 48.9 | CanonPose |
| 3D | H3WB | Average MPJPE (mm) | 53.5 | Jointformer |
| 3D | H3WB | Average MPJPE (mm) | 56.2 | CanonPose |
| 3D | H3WB | Average MPJPE (mm) | 56.9 | CPN + Jointformer |
| 3D | H3WB | Average MPJPE (mm) | 63.1 | Resnet50 |
| 3D | H3WB | Average MPJPE (mm) | 64.3 | SHN + SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 83.4 | SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 14.6 | Large SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 17.8 | Jointformer |
| 3D | H3WB | Average MPJPE (mm) | 17.9 | CanonPose + 3D supervision |
| 3D | H3WB | Average MPJPE (mm) | 19.8 | Large SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 19.8 | Jointformer |
| 3D | H3WB | Average MPJPE (mm) | 20.7 | CPN + Jointformer |
| 3D | H3WB | Average MPJPE (mm) | 22.2 | CanonPose + 3D supervision |
| 3D | H3WB | Average MPJPE (mm) | 24.6 | CanonPose |
| 3D | H3WB | Average MPJPE (mm) | 24.6 | SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 26.3 | Resnet50 |
| 3D | H3WB | Average MPJPE (mm) | 31.9 | CanonPose |
| 3D | H3WB | Average MPJPE (mm) | 32.5 | SHN + SimpleBaseline |
| 3D | H3WB | Average MPJPE (mm) | 34 | SimpleBaseline |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 14.6 | Large SimpleBaseline |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 17.8 | Jointformer |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 17.9 | CanonPose + 3D supervision |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 19.8 | Large SimpleBaseline |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 19.8 | Jointformer |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 20.7 | CPN + Jointformer |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 22.2 | CanonPose + 3D supervision |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 24.6 | CanonPose |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 24.6 | SimpleBaseline |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 26.3 | Resnet50 |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 31.9 | CanonPose |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 32.5 | SHN + SimpleBaseline |
| 3D Face Modelling | H3WB | Average MPJPE (mm) | 34 | SimpleBaseline |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 14.6 | Large SimpleBaseline |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 17.8 | Jointformer |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 17.9 | CanonPose + 3D supervision |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 19.8 | Large SimpleBaseline |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 19.8 | Jointformer |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 20.7 | CPN + Jointformer |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 22.2 | CanonPose + 3D supervision |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 24.6 | CanonPose |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 24.6 | SimpleBaseline |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 26.3 | Resnet50 |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 31.9 | CanonPose |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 32.5 | SHN + SimpleBaseline |
| 3D Face Reconstruction | H3WB | Average MPJPE (mm) | 34 | SimpleBaseline |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 31.7 | Large SimpleBaseline |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 38.3 | CanonPose + 3D supervision |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 42.5 | SimpleBaseline |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 43.7 | Jointformer |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 44.8 | Large SimpleBaseline |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 47.4 | CanonPose + 3D supervision |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 48.9 | CanonPose |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 53.5 | Jointformer |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 56.2 | CanonPose |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 56.9 | CPN + Jointformer |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 63.1 | Resnet50 |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 64.3 | SHN + SimpleBaseline |
| 3D Hand Pose Estimation | H3WB | Average MPJPE (mm) | 83.4 | SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 84.9 | Jointformer |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 103 | Jointformer |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 112.6 | Large SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 117.5 | CanonPose + 3D supervision |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 125.7 | SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 131.6 | Large SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 142.8 | CPN + Jointformer |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 151.6 | Resnet50 |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 155.9 | CanonPose + 3D supervision |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 189.6 | SHN + SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 193.7 | CanonPose |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 252 | SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | MPJPE | 264.4 | CanonPose |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 31.7 | Large SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 38.3 | CanonPose + 3D supervision |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 42.5 | SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 43.7 | Jointformer |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 44.8 | Large SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 47.4 | CanonPose + 3D supervision |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 48.9 | CanonPose |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 53.5 | Jointformer |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 56.2 | CanonPose |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 56.9 | CPN + Jointformer |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 63.1 | Resnet50 |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 64.3 | SHN + SimpleBaseline |
| 1 Image, 2*2 Stitchi | H3WB | Average MPJPE (mm) | 83.4 | SimpleBaseline |