Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang
This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. The code and models have been publicly available at \url{https://github.com/leoxiaobin/deep-high-resolution-net.pytorch}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | COCO val2017 | AP | 75.3 | HRNet (256x192) |
| Pose Estimation | AIC | AP | 33.5 | HRNet (HRNet-w48 ) |
| Pose Estimation | AIC | AP50 | 78 | HRNet (HRNet-w48 ) |
| Pose Estimation | AIC | AP75 | 23.6 | HRNet (HRNet-w48 ) |
| Pose Estimation | AIC | AR | 37.9 | HRNet (HRNet-w48 ) |
| Pose Estimation | AIC | AR50 | 80 | HRNet (HRNet-w48 ) |
| Pose Estimation | AIC | AP | 32.3 | HRNet (HRNet-w32) |
| Pose Estimation | AIC | AP50 | 76.2 | HRNet (HRNet-w32) |
| Pose Estimation | AIC | AP75 | 21.9 | HRNet (HRNet-w32) |
| Pose Estimation | AIC | AR | 36.6 | HRNet (HRNet-w32) |
| Pose Estimation | AIC | AR50 | 78.9 | HRNet (HRNet-w32) |
| Pose Estimation | COCO test-dev | AP | 77 | HRNet-W48 + extra data |
| Pose Estimation | COCO test-dev | AP50 | 92.7 | HRNet-W48 + extra data |
| Pose Estimation | COCO test-dev | AP75 | 84.5 | HRNet-W48 + extra data |
| Pose Estimation | COCO test-dev | APL | 83.1 | HRNet-W48 + extra data |
| Pose Estimation | COCO test-dev | APM | 73.4 | HRNet-W48 + extra data |
| Pose Estimation | COCO test-dev | AR | 82 | HRNet-W48 + extra data |
| Pose Estimation | MPII Human Pose | PCKh-0.5 | 92.3 | HRNet-W32 |
| Pose Estimation | BRACE | Average Precision | 0.357 | HRNet fine-tuned on BRACE |
| Pose Estimation | BRACE | Average Recall | 0.445 | HRNet fine-tuned on BRACE |
| Pose Estimation | BRACE | Average Precision | 0.158 | HRNet pre-trained on COCO |
| Pose Estimation | BRACE | Average Recall | 0.202 | HRNet pre-trained on COCO |
| Pose Estimation | COCO test-dev | AP50 | 92.7 | HRNet* |
| Pose Estimation | COCO test-dev | AP75 | 84.5 | HRNet* |
| Pose Estimation | COCO test-dev | APL | 83.1 | HRNet* |
| Pose Estimation | COCO test-dev | APM | 73.4 | HRNet* |
| Pose Estimation | COCO test-dev | AR | 82 | HRNet* |
| Pose Estimation | COCO test-dev | AP50 | 92.5 | HRNet |
| Pose Estimation | COCO test-dev | AP75 | 83.3 | HRNet |
| Pose Estimation | COCO test-dev | APL | 81.5 | HRNet |
| Pose Estimation | COCO test-dev | APM | 71.9 | HRNet |
| Pose Estimation | COCO test-dev | AR | 80.5 | HRNet |
| Pose Estimation | COCO (Common Objects in Context) | Test AP | 75.5 | HRNet-48(384x288) |
| Pose Estimation | COCO (Common Objects in Context) | Validation AP | 76.3 | HRNet-48(384x288) |
| Pose Estimation | COCO (Common Objects in Context) | Validation AP | 75.8 | HRNet-32 |
| Pose Estimation | HARPER | Average MPJPE (mm) | 151 | HRNet + Depth |
| 2D Pose Estimation | HARPER | PCK | 868 | HRNet |
| 3D | COCO val2017 | AP | 75.3 | HRNet (256x192) |
| 3D | AIC | AP | 33.5 | HRNet (HRNet-w48 ) |
| 3D | AIC | AP50 | 78 | HRNet (HRNet-w48 ) |
| 3D | AIC | AP75 | 23.6 | HRNet (HRNet-w48 ) |
| 3D | AIC | AR | 37.9 | HRNet (HRNet-w48 ) |
| 3D | AIC | AR50 | 80 | HRNet (HRNet-w48 ) |
| 3D | AIC | AP | 32.3 | HRNet (HRNet-w32) |
| 3D | AIC | AP50 | 76.2 | HRNet (HRNet-w32) |
| 3D | AIC | AP75 | 21.9 | HRNet (HRNet-w32) |
| 3D | AIC | AR | 36.6 | HRNet (HRNet-w32) |
| 3D | AIC | AR50 | 78.9 | HRNet (HRNet-w32) |
| 3D | COCO test-dev | AP | 77 | HRNet-W48 + extra data |
| 3D | COCO test-dev | AP50 | 92.7 | HRNet-W48 + extra data |
| 3D | COCO test-dev | AP75 | 84.5 | HRNet-W48 + extra data |
| 3D | COCO test-dev | APL | 83.1 | HRNet-W48 + extra data |
| 3D | COCO test-dev | APM | 73.4 | HRNet-W48 + extra data |
| 3D | COCO test-dev | AR | 82 | HRNet-W48 + extra data |
| 3D | MPII Human Pose | PCKh-0.5 | 92.3 | HRNet-W32 |
| 3D | BRACE | Average Precision | 0.357 | HRNet fine-tuned on BRACE |
| 3D | BRACE | Average Recall | 0.445 | HRNet fine-tuned on BRACE |
| 3D | BRACE | Average Precision | 0.158 | HRNet pre-trained on COCO |
| 3D | BRACE | Average Recall | 0.202 | HRNet pre-trained on COCO |
| 3D | COCO test-dev | AP50 | 92.7 | HRNet* |
| 3D | COCO test-dev | AP75 | 84.5 | HRNet* |
| 3D | COCO test-dev | APL | 83.1 | HRNet* |
| 3D | COCO test-dev | APM | 73.4 | HRNet* |
| 3D | COCO test-dev | AR | 82 | HRNet* |
| 3D | COCO test-dev | AP50 | 92.5 | HRNet |
| 3D | COCO test-dev | AP75 | 83.3 | HRNet |
| 3D | COCO test-dev | APL | 81.5 | HRNet |
| 3D | COCO test-dev | APM | 71.9 | HRNet |
| 3D | COCO test-dev | AR | 80.5 | HRNet |
| 3D | COCO (Common Objects in Context) | Test AP | 75.5 | HRNet-48(384x288) |
| 3D | COCO (Common Objects in Context) | Validation AP | 76.3 | HRNet-48(384x288) |
| 3D | COCO (Common Objects in Context) | Validation AP | 75.8 | HRNet-32 |
| 3D | HARPER | Average MPJPE (mm) | 151 | HRNet + Depth |
| Pose Tracking | PoseTrack2017 | MOTA | 57.93 | HRNet-W48 COCO |
| Pose Tracking | PoseTrack2017 | mAP | 74.95 | HRNet-W48 COCO |
| 2D Human Pose Estimation | Human-Art | AP | 0.417 | HRNet-w48 |
| 2D Human Pose Estimation | Human-Art | AP (gt bbox) | 0.769 | HRNet-w48 |
| 2D Human Pose Estimation | Human-Art | AP | 0.399 | HRNet-w32 |
| 2D Human Pose Estimation | Human-Art | AP (gt bbox) | 0.754 | HRNet-w32 |
| 2D Human Pose Estimation | COCO-WholeBody | WB | 43.2 | HRNet |
| 2D Human Pose Estimation | COCO-WholeBody | body | 65.9 | HRNet |
| 2D Human Pose Estimation | COCO-WholeBody | face | 52.3 | HRNet |
| 2D Human Pose Estimation | COCO-WholeBody | foot | 31.4 | HRNet |
| 2D Human Pose Estimation | COCO-WholeBody | hand | 30 | HRNet |
| 3D Pose Estimation | HARPER | Average MPJPE (mm) | 151 | HRNet + Depth |
| 2D Classification | HARPER | PCK | 868 | HRNet |
| 1 Image, 2*2 Stitchi | COCO val2017 | AP | 75.3 | HRNet (256x192) |
| 1 Image, 2*2 Stitchi | AIC | AP | 33.5 | HRNet (HRNet-w48 ) |
| 1 Image, 2*2 Stitchi | AIC | AP50 | 78 | HRNet (HRNet-w48 ) |
| 1 Image, 2*2 Stitchi | AIC | AP75 | 23.6 | HRNet (HRNet-w48 ) |
| 1 Image, 2*2 Stitchi | AIC | AR | 37.9 | HRNet (HRNet-w48 ) |
| 1 Image, 2*2 Stitchi | AIC | AR50 | 80 | HRNet (HRNet-w48 ) |
| 1 Image, 2*2 Stitchi | AIC | AP | 32.3 | HRNet (HRNet-w32) |
| 1 Image, 2*2 Stitchi | AIC | AP50 | 76.2 | HRNet (HRNet-w32) |
| 1 Image, 2*2 Stitchi | AIC | AP75 | 21.9 | HRNet (HRNet-w32) |
| 1 Image, 2*2 Stitchi | AIC | AR | 36.6 | HRNet (HRNet-w32) |
| 1 Image, 2*2 Stitchi | AIC | AR50 | 78.9 | HRNet (HRNet-w32) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 77 | HRNet-W48 + extra data |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 92.7 | HRNet-W48 + extra data |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 84.5 | HRNet-W48 + extra data |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 83.1 | HRNet-W48 + extra data |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 73.4 | HRNet-W48 + extra data |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 82 | HRNet-W48 + extra data |
| 1 Image, 2*2 Stitchi | MPII Human Pose | PCKh-0.5 | 92.3 | HRNet-W32 |
| 1 Image, 2*2 Stitchi | BRACE | Average Precision | 0.357 | HRNet fine-tuned on BRACE |
| 1 Image, 2*2 Stitchi | BRACE | Average Recall | 0.445 | HRNet fine-tuned on BRACE |
| 1 Image, 2*2 Stitchi | BRACE | Average Precision | 0.158 | HRNet pre-trained on COCO |
| 1 Image, 2*2 Stitchi | BRACE | Average Recall | 0.202 | HRNet pre-trained on COCO |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 92.7 | HRNet* |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 84.5 | HRNet* |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 83.1 | HRNet* |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 73.4 | HRNet* |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 82 | HRNet* |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 92.5 | HRNet |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 83.3 | HRNet |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 81.5 | HRNet |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 71.9 | HRNet |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 80.5 | HRNet |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Test AP | 75.5 | HRNet-48(384x288) |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Validation AP | 76.3 | HRNet-48(384x288) |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Validation AP | 75.8 | HRNet-32 |
| 1 Image, 2*2 Stitchi | HARPER | Average MPJPE (mm) | 151 | HRNet + Depth |