Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, Jian Sun
Existing pose estimation approaches fall into two categories: single-stage and multi-stage methods. While multi-stage methods are seemingly more suited for the task, their performance in current practice is not as good as single-stage methods. This work studies this issue. We argue that the current multi-stage methods' unsatisfactory performance comes from the insufficiency in various design choices. We propose several improvements, including the single-stage module design, cross stage feature aggregation, and coarse-to-fine supervision. The resulting method establishes the new state-of-the-art on both MS COCO and MPII Human Pose dataset, justifying the effectiveness of a multi-stage architecture. The source code is publicly available for further research.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | COCO minival | AP | 75.9 | MSPN |
| Pose Estimation | COCO test-dev | AP | 76.1 | MSPN |
| Pose Estimation | COCO test-dev | AP50 | 93.4 | MSPN |
| Pose Estimation | COCO test-dev | AP75 | 83.8 | MSPN |
| Pose Estimation | COCO test-dev | APL | 81.5 | MSPN |
| Pose Estimation | COCO test-dev | APM | 72.3 | MSPN |
| Pose Estimation | COCO test-dev | AR | 81.6 | MSPN |
| Pose Estimation | MPII Human Pose | PCKh-0.5 | 92.6 | MSPN |
| Pose Estimation | COCO test-dev | AP | 76.1 | MSPN |
| Pose Estimation | COCO test-dev | AP50 | 93.4 | MSPN |
| Pose Estimation | COCO test-dev | AP75 | 83.8 | MSPN |
| Pose Estimation | COCO test-dev | APL | 81.5 | MSPN |
| Pose Estimation | COCO test-dev | APM | 72.3 | MSPN |
| Pose Estimation | COCO test-dev | AR | 81.6 | MSPN |
| Pose Estimation | COCO test-dev | AR50 | 96.3 | MSPN |
| Pose Estimation | COCO test-dev | AR75 | 88.1 | MSPN |
| Pose Estimation | COCO test-dev | ARL | 87.1 | MSPN |
| Pose Estimation | COCO test-dev | ARM | 77.5 | MSPN |
| Pose Estimation | COCO (Common Objects in Context) | Test AP | 76.1 | MSPN(384x288) |
| Pose Estimation | COCO test-challenge | AP | 76.4 | MSPN+* |
| Pose Estimation | COCO test-challenge | AP50 | 92.9 | MSPN+* |
| Pose Estimation | COCO test-challenge | AP75 | 82.6 | MSPN+* |
| Pose Estimation | COCO test-challenge | APL | 88.6 | MSPN+* |
| Pose Estimation | COCO test-challenge | AR | 82.2 | MSPN+* |
| Pose Estimation | COCO test-challenge | AR50 | 96 | MSPN+* |
| Pose Estimation | COCO test-challenge | AR75 | 87.7 | MSPN+* |
| Pose Estimation | COCO test-challenge | ARL | 83.2 | MSPN+* |
| Pose Estimation | COCO test-challenge | ARM | 77.5 | MSPN+* |
| 3D | COCO minival | AP | 75.9 | MSPN |
| 3D | COCO test-dev | AP | 76.1 | MSPN |
| 3D | COCO test-dev | AP50 | 93.4 | MSPN |
| 3D | COCO test-dev | AP75 | 83.8 | MSPN |
| 3D | COCO test-dev | APL | 81.5 | MSPN |
| 3D | COCO test-dev | APM | 72.3 | MSPN |
| 3D | COCO test-dev | AR | 81.6 | MSPN |
| 3D | MPII Human Pose | PCKh-0.5 | 92.6 | MSPN |
| 3D | COCO test-dev | AP | 76.1 | MSPN |
| 3D | COCO test-dev | AP50 | 93.4 | MSPN |
| 3D | COCO test-dev | AP75 | 83.8 | MSPN |
| 3D | COCO test-dev | APL | 81.5 | MSPN |
| 3D | COCO test-dev | APM | 72.3 | MSPN |
| 3D | COCO test-dev | AR | 81.6 | MSPN |
| 3D | COCO test-dev | AR50 | 96.3 | MSPN |
| 3D | COCO test-dev | AR75 | 88.1 | MSPN |
| 3D | COCO test-dev | ARL | 87.1 | MSPN |
| 3D | COCO test-dev | ARM | 77.5 | MSPN |
| 3D | COCO (Common Objects in Context) | Test AP | 76.1 | MSPN(384x288) |
| 3D | COCO test-challenge | AP | 76.4 | MSPN+* |
| 3D | COCO test-challenge | AP50 | 92.9 | MSPN+* |
| 3D | COCO test-challenge | AP75 | 82.6 | MSPN+* |
| 3D | COCO test-challenge | APL | 88.6 | MSPN+* |
| 3D | COCO test-challenge | AR | 82.2 | MSPN+* |
| 3D | COCO test-challenge | AR50 | 96 | MSPN+* |
| 3D | COCO test-challenge | AR75 | 87.7 | MSPN+* |
| 3D | COCO test-challenge | ARL | 83.2 | MSPN+* |
| 3D | COCO test-challenge | ARM | 77.5 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO minival | AP | 75.9 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 76.1 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 93.4 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 83.8 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 81.5 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 72.3 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 81.6 | MSPN |
| 1 Image, 2*2 Stitchi | MPII Human Pose | PCKh-0.5 | 92.6 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 76.1 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 93.4 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 83.8 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 81.5 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 72.3 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 81.6 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR50 | 96.3 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR75 | 88.1 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | ARL | 87.1 | MSPN |
| 1 Image, 2*2 Stitchi | COCO test-dev | ARM | 77.5 | MSPN |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Test AP | 76.1 | MSPN(384x288) |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AP | 76.4 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AP50 | 92.9 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AP75 | 82.6 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | APL | 88.6 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AR | 82.2 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AR50 | 96 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AR75 | 87.7 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | ARL | 83.2 | MSPN+* |
| 1 Image, 2*2 Stitchi | COCO test-challenge | ARM | 77.5 | MSPN+* |