Hanyue Tu, Chunyu Wang, Wen-Jun Zeng
We present an approach to estimate 3D poses of multiple people from multiple camera views. In contrast to the previous efforts which require to establish cross-view correspondence based on noisy and incomplete 2D pose estimations, we present an end-to-end solution which directly operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space. To achieve this goal, the features in all camera views are warped and aggregated in a common 3D space, and fed into Cuboid Proposal Network (CPN) to coarsely localize all people. Then we propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal. The approach is robust to occlusion which occurs frequently in practice. Without bells and whistles, it outperforms the state-of-the-arts on the public datasets. Code will be released at https://github.com/microsoft/multiperson-pose-estimation-pytorch.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | Panoptic | Average MPJPE (mm) | 17.68 | VoxelPose |
| 3D Human Pose Estimation | Shelf | PCP3D | 97 | VoxelPose |
| 3D Human Pose Estimation | Campus | PCP3D | 96.7 | VoxelPose |
| Pose Estimation | Panoptic | Average MPJPE (mm) | 17.68 | VoxelPose |
| Pose Estimation | Shelf | PCP3D | 97 | VoxelPose |
| Pose Estimation | Campus | PCP3D | 96.7 | VoxelPose |
| 3D | Panoptic | Average MPJPE (mm) | 17.68 | VoxelPose |
| 3D | Shelf | PCP3D | 97 | VoxelPose |
| 3D | Campus | PCP3D | 96.7 | VoxelPose |
| 3D Multi-Person Pose Estimation | Panoptic | Average MPJPE (mm) | 17.68 | VoxelPose |
| 3D Multi-Person Pose Estimation | Shelf | PCP3D | 97 | VoxelPose |
| 3D Multi-Person Pose Estimation | Campus | PCP3D | 96.7 | VoxelPose |
| 1 Image, 2*2 Stitchi | Panoptic | Average MPJPE (mm) | 17.68 | VoxelPose |
| 1 Image, 2*2 Stitchi | Shelf | PCP3D | 97 | VoxelPose |
| 1 Image, 2*2 Stitchi | Campus | PCP3D | 96.7 | VoxelPose |