Saurabh Sharma, Pavan Teja Varigonda, Prashast Bindal, Abhishek Sharma, Arjun Jain
Monocular 3D human-pose estimation from static images is a challenging problem, due to the curse of dimensionality and the ill-posed nature of lifting 2D-to-3D. In this paper, we propose a Deep Conditional Variational Autoencoder based model that synthesizes diverse anatomically plausible 3D-pose samples conditioned on the estimated 2D-pose. We show that CVAE-based 3D-pose sample set is consistent with the 2D-pose and helps tackling the inherent ambiguity in 2D-to-3D lifting. We propose two strategies for obtaining the final 3D pose- (a) depth-ordering/ordinal relations to score and weight-average the candidate 3D-poses, referred to as OrdinalScore, and (b) with supervision from an Oracle. We report close to state of-the-art results on two benchmark datasets using OrdinalScore, and state-of-the-art results using the Oracle. We also show that our pipeline yields competitive results without paired image-to-3D annotations. The training and evaluation code is available at https://github.com/ssfootball04/generative_pose.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | HumanEva-I | Mean Reconstruction Error (mm) | 23.9 | Ours (Oracle) |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 58 | MultiPoseNet |
| 3D Human Pose Estimation | Human3.6M | Frames Needed | 1 | MultiPoseNet |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 46.8 | Sharma et al. |
| 3D Human Pose Estimation | Human3.6M | Average PMPJPE (mm) | 37.3 | Sharma et al. |
| Pose Estimation | HumanEva-I | Mean Reconstruction Error (mm) | 23.9 | Ours (Oracle) |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 58 | MultiPoseNet |
| Pose Estimation | Human3.6M | Frames Needed | 1 | MultiPoseNet |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 46.8 | Sharma et al. |
| Pose Estimation | Human3.6M | Average PMPJPE (mm) | 37.3 | Sharma et al. |
| 3D | HumanEva-I | Mean Reconstruction Error (mm) | 23.9 | Ours (Oracle) |
| 3D | Human3.6M | Average MPJPE (mm) | 58 | MultiPoseNet |
| 3D | Human3.6M | Frames Needed | 1 | MultiPoseNet |
| 3D | Human3.6M | Average MPJPE (mm) | 46.8 | Sharma et al. |
| 3D | Human3.6M | Average PMPJPE (mm) | 37.3 | Sharma et al. |
| 1 Image, 2*2 Stitchi | HumanEva-I | Mean Reconstruction Error (mm) | 23.9 | Ours (Oracle) |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 58 | MultiPoseNet |
| 1 Image, 2*2 Stitchi | Human3.6M | Frames Needed | 1 | MultiPoseNet |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 46.8 | Sharma et al. |
| 1 Image, 2*2 Stitchi | Human3.6M | Average PMPJPE (mm) | 37.3 | Sharma et al. |