Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda
Recent literature addressed the monocular 3D pose estimation task very satisfactorily. In these studies, different persons are usually treated as independent pose instances to estimate. However, in many every-day situations, people are interacting, and the pose of an individual depends on the pose of his/her interactees. In this paper, we investigate how to exploit this dependency to enhance current - and possibly future - deep networks for 3D monocular pose estimation. Our pose interacting network, or PI-Net, inputs the initial pose estimates of a variable number of interactees into a recurrent architecture used to refine the pose of the person-of-interest. Evaluating such a method is challenging due to the limited availability of public annotated multi-person 3D human pose datasets. We demonstrate the effectiveness of our method in the MuPoTS dataset, setting the new state-of-the-art on it. Qualitative results on other multi-person datasets (for which 3D pose ground-truth is not available) showcase the proposed PI-Net. PI-Net is implemented in PyTorch and the code will be made available upon acceptance of the paper.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Multi-Person Pose Estimation (root-relative) | MuPoTS-3D | 3DPCK | 82.5 | PI-Net |
| 3D Human Pose Estimation | MuPoTS-3D | 3DPCK | 82.5 | PI-Net |
| Pose Estimation | MuPoTS-3D | 3DPCK | 82.5 | PI-Net |
| 3D | MuPoTS-3D | 3DPCK | 82.5 | PI-Net |
| 3D Multi-Person Pose Estimation | MuPoTS-3D | 3DPCK | 82.5 | PI-Net |
| 1 Image, 2*2 Stitchi | MuPoTS-3D | 3DPCK | 82.5 | PI-Net |