Yuanhao Cai, Zhicheng Wang, Zhengxiong Luo, Binyi Yin, Angang Du, Haoqian Wang, Xiangyu Zhang, Xinyu Zhou, Erjin Zhou, Jian Sun
In this paper, we propose a novel method called Residual Steps Network (RSN). RSN aggregates features with the same spatial size (Intra-level features) efficiently to obtain delicate local representations, which retain rich low-level spatial information and result in precise keypoint localization. Additionally, we observe the output features contribute differently to final performance. To tackle this problem, we propose an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations. Our approach won the 1st place of COCO Keypoint Challenge 2019 and achieves state-of-the-art results on both COCO and MPII benchmarks, without using extra training data and pretrained model. Our single model achieves 78.6 on COCO test-dev, 93.0 on MPII test dataset. Ensembled models achieve 79.2 on COCO test-dev, 77.1 on COCO test-challenge dataset. The source code is publicly available for further research at https://github.com/caiyuanhao1998/RSN/
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | COCO test-dev | AP | 79.2 | 4xRSN-50 (ensemble) |
| Pose Estimation | COCO test-dev | AP50 | 94.4 | 4xRSN-50 (ensemble) |
| Pose Estimation | COCO test-dev | AP75 | 87.1 | 4xRSN-50 (ensemble) |
| Pose Estimation | COCO test-dev | APL | 76.1 | 4xRSN-50 (ensemble) |
| Pose Estimation | COCO test-dev | APM | 83.8 | 4xRSN-50 (ensemble) |
| Pose Estimation | COCO test-dev | AR | 84.1 | 4xRSN-50 (ensemble) |
| Pose Estimation | COCO test-dev | AP | 78.6 | 4xRSN-50 |
| Pose Estimation | COCO test-dev | AP50 | 94.3 | 4xRSN-50 |
| Pose Estimation | COCO test-dev | AP75 | 86.6 | 4xRSN-50 |
| Pose Estimation | COCO test-dev | APL | 75.5 | 4xRSN-50 |
| Pose Estimation | COCO test-dev | APM | 83.3 | 4xRSN-50 |
| Pose Estimation | COCO test-dev | AR | 83.8 | 4xRSN-50 |
| Pose Estimation | MPII Human Pose | PCKh-0.5 | 93 | 4xRSN-50 |
| Pose Estimation | MPII Single Person | PCKh@0.5 | 93 | 4xRSN-50 |
| Pose Estimation | COCO (Common Objects in Context) | Test AP | 78.6 | 4xRSN-50(384×288) |
| Pose Estimation | COCO test-challenge | AP | 77.1 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | AP50 | 93.3 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | AP75 | 83.6 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | APL | 82.6 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | AR | 82.6 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | AR50 | 96.1 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | AR75 | 88.2 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | ARL | 88.7 | 4×RSN-50 |
| Pose Estimation | COCO test-challenge | ARM | 78 | 4×RSN-50 |
| Pose Estimation | COCO (Common Objects in Context) | AP | 0.792 | RSN |
| 3D | COCO test-dev | AP | 79.2 | 4xRSN-50 (ensemble) |
| 3D | COCO test-dev | AP50 | 94.4 | 4xRSN-50 (ensemble) |
| 3D | COCO test-dev | AP75 | 87.1 | 4xRSN-50 (ensemble) |
| 3D | COCO test-dev | APL | 76.1 | 4xRSN-50 (ensemble) |
| 3D | COCO test-dev | APM | 83.8 | 4xRSN-50 (ensemble) |
| 3D | COCO test-dev | AR | 84.1 | 4xRSN-50 (ensemble) |
| 3D | COCO test-dev | AP | 78.6 | 4xRSN-50 |
| 3D | COCO test-dev | AP50 | 94.3 | 4xRSN-50 |
| 3D | COCO test-dev | AP75 | 86.6 | 4xRSN-50 |
| 3D | COCO test-dev | APL | 75.5 | 4xRSN-50 |
| 3D | COCO test-dev | APM | 83.3 | 4xRSN-50 |
| 3D | COCO test-dev | AR | 83.8 | 4xRSN-50 |
| 3D | MPII Human Pose | PCKh-0.5 | 93 | 4xRSN-50 |
| 3D | MPII Single Person | PCKh@0.5 | 93 | 4xRSN-50 |
| 3D | COCO (Common Objects in Context) | Test AP | 78.6 | 4xRSN-50(384×288) |
| 3D | COCO test-challenge | AP | 77.1 | 4×RSN-50 |
| 3D | COCO test-challenge | AP50 | 93.3 | 4×RSN-50 |
| 3D | COCO test-challenge | AP75 | 83.6 | 4×RSN-50 |
| 3D | COCO test-challenge | APL | 82.6 | 4×RSN-50 |
| 3D | COCO test-challenge | AR | 82.6 | 4×RSN-50 |
| 3D | COCO test-challenge | AR50 | 96.1 | 4×RSN-50 |
| 3D | COCO test-challenge | AR75 | 88.2 | 4×RSN-50 |
| 3D | COCO test-challenge | ARL | 88.7 | 4×RSN-50 |
| 3D | COCO test-challenge | ARM | 78 | 4×RSN-50 |
| 3D | COCO (Common Objects in Context) | AP | 0.792 | RSN |
| Multi-Person Pose Estimation | COCO (Common Objects in Context) | AP | 0.792 | RSN |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 79.2 | 4xRSN-50 (ensemble) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 94.4 | 4xRSN-50 (ensemble) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 87.1 | 4xRSN-50 (ensemble) |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 76.1 | 4xRSN-50 (ensemble) |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 83.8 | 4xRSN-50 (ensemble) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 84.1 | 4xRSN-50 (ensemble) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 78.6 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 94.3 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 86.6 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 75.5 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 83.3 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 83.8 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | MPII Human Pose | PCKh-0.5 | 93 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | MPII Single Person | PCKh@0.5 | 93 | 4xRSN-50 |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Test AP | 78.6 | 4xRSN-50(384×288) |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AP | 77.1 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AP50 | 93.3 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AP75 | 83.6 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | APL | 82.6 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AR | 82.6 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AR50 | 96.1 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | AR75 | 88.2 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | ARL | 88.7 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO test-challenge | ARM | 78 | 4×RSN-50 |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | AP | 0.792 | RSN |