We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose. Inspired by recent anchor-free object detectors, which directly regress the two corners of target bounding-boxes, the proposed framework directly predicts instance-aware keypoints for all the instances from a raw input image, eliminating the need for heuristic grouping in bottom-up methods or bounding-box detection and RoI operations in top-down ones. We also propose a novel Keypoint Alignment (KPAlign) mechanism, which overcomes the main difficulty: lack of the alignment between the convolutional features and predictions in this end-to-end framework. KPAlign improves the framework's performance by a large margin while still keeping the framework end-to-end trainable. With the only postprocessing non-maximum suppression (NMS), our proposed framework can detect multi-person keypoints with or without bounding-boxes in a single shot. Experiments demonstrate that the end-to-end paradigm can achieve competitive or better performance than previous strong baselines, in both bottom-up and top-down methods. We hope that our end-to-end approach can provide a new perspective for the human pose estimation task.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | COCO test-dev | AP | 63.3 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | AP50 | 86.7 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | AP75 | 69.4 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | APL | 71.2 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | APM | 57.8 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | AP | 64.8 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | AP50 | 87.8 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | AP75 | 71.1 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | APL | 71.5 | DirectPose (ResNet-101) |
| Pose Estimation | COCO test-dev | APM | 60.4 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | AP | 63.3 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | AP50 | 86.7 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | AP75 | 69.4 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | APL | 71.2 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | APM | 57.8 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | AP | 64.8 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | AP50 | 87.8 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | AP75 | 71.1 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | APL | 71.5 | DirectPose (ResNet-101) |
| 3D | COCO test-dev | APM | 60.4 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 63.3 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 86.7 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 69.4 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 71.2 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 57.8 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP | 64.8 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 87.8 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 71.1 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 71.5 | DirectPose (ResNet-101) |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 60.4 | DirectPose (ResNet-101) |