Bastian Wandt, Bodo Rosenhahn
This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 54.8 | RepNet (H36M) |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 92.5 | RepNet (H36M) |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 81.8 | RepNet (H36M) |
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 58.5 | RepNet (3DHP) |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 97.8 | RepNet (3DHP) |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 82.5 | RepNet (3DHP) |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| 3D Human Pose Estimation | Human3.6M | Frames Needed | 1 | RepNet |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| 3D Human Pose Estimation | Human3.6M | Number of Frames Per View | 1 | RepNet |
| 3D Human Pose Estimation | Human3.6M | Number of Views | 1 | RepNet |
| Pose Estimation | MPI-INF-3DHP | AUC | 54.8 | RepNet (H36M) |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 92.5 | RepNet (H36M) |
| Pose Estimation | MPI-INF-3DHP | PCK | 81.8 | RepNet (H36M) |
| Pose Estimation | MPI-INF-3DHP | AUC | 58.5 | RepNet (3DHP) |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 97.8 | RepNet (3DHP) |
| Pose Estimation | MPI-INF-3DHP | PCK | 82.5 | RepNet (3DHP) |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| Pose Estimation | Human3.6M | Frames Needed | 1 | RepNet |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| Pose Estimation | Human3.6M | Number of Frames Per View | 1 | RepNet |
| Pose Estimation | Human3.6M | Number of Views | 1 | RepNet |
| 3D | MPI-INF-3DHP | AUC | 54.8 | RepNet (H36M) |
| 3D | MPI-INF-3DHP | MPJPE | 92.5 | RepNet (H36M) |
| 3D | MPI-INF-3DHP | PCK | 81.8 | RepNet (H36M) |
| 3D | MPI-INF-3DHP | AUC | 58.5 | RepNet (3DHP) |
| 3D | MPI-INF-3DHP | MPJPE | 97.8 | RepNet (3DHP) |
| 3D | MPI-INF-3DHP | PCK | 82.5 | RepNet (3DHP) |
| 3D | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| 3D | Human3.6M | Frames Needed | 1 | RepNet |
| 3D | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| 3D | Human3.6M | Number of Frames Per View | 1 | RepNet |
| 3D | Human3.6M | Number of Views | 1 | RepNet |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 54.8 | RepNet (H36M) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 92.5 | RepNet (H36M) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 81.8 | RepNet (H36M) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 58.5 | RepNet (3DHP) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 97.8 | RepNet (3DHP) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 82.5 | RepNet (3DHP) |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| 1 Image, 2*2 Stitchi | Human3.6M | Frames Needed | 1 | RepNet |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 89.9 | RepNet |
| 1 Image, 2*2 Stitchi | Human3.6M | Number of Frames Per View | 1 | RepNet |
| 1 Image, 2*2 Stitchi | Human3.6M | Number of Views | 1 | RepNet |