Bruno Artacho, Andreas Savakis
We propose BAPose, a novel bottom-up approach that achieves state-of-the-art results for multi-person pose estimation. Our end-to-end trainable framework leverages a disentangled multi-scale waterfall architecture and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes with occlusions. The multi-scale representations, obtained by the disentangled waterfall module in BAPose, leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework for multi-person pose estimation, achieving significant improvements on state-of-the-art accuracy.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | COCO (Common Objects in Context) | AP | 0.727 | BAPose |
| Pose Estimation | COCO (Common Objects in Context) | Test AP | 71.2 | BAPose |
| Pose Estimation | COCO (Common Objects in Context) | Validation AP | 72.7 | BAPose |
| Pose Estimation | CrowdPose | AP Easy | 79.9 | BAPose (W32) |
| Pose Estimation | CrowdPose | AP Hard | 61.3 | BAPose (W32) |
| Pose Estimation | CrowdPose | AP Medium | 73.4 | BAPose (W32) |
| Pose Estimation | CrowdPose | mAP @0.5:0.95 | 72.2 | BAPose (W32) |
| 3D | COCO (Common Objects in Context) | AP | 0.727 | BAPose |
| 3D | COCO (Common Objects in Context) | Test AP | 71.2 | BAPose |
| 3D | COCO (Common Objects in Context) | Validation AP | 72.7 | BAPose |
| 3D | CrowdPose | AP Easy | 79.9 | BAPose (W32) |
| 3D | CrowdPose | AP Hard | 61.3 | BAPose (W32) |
| 3D | CrowdPose | AP Medium | 73.4 | BAPose (W32) |
| 3D | CrowdPose | mAP @0.5:0.95 | 72.2 | BAPose (W32) |
| Multi-Person Pose Estimation | COCO (Common Objects in Context) | AP | 0.727 | BAPose |
| Multi-Person Pose Estimation | COCO (Common Objects in Context) | Test AP | 71.2 | BAPose |
| Multi-Person Pose Estimation | COCO (Common Objects in Context) | Validation AP | 72.7 | BAPose |
| Multi-Person Pose Estimation | CrowdPose | AP Easy | 79.9 | BAPose (W32) |
| Multi-Person Pose Estimation | CrowdPose | AP Hard | 61.3 | BAPose (W32) |
| Multi-Person Pose Estimation | CrowdPose | AP Medium | 73.4 | BAPose (W32) |
| Multi-Person Pose Estimation | CrowdPose | mAP @0.5:0.95 | 72.2 | BAPose (W32) |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | AP | 0.727 | BAPose |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Test AP | 71.2 | BAPose |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Validation AP | 72.7 | BAPose |
| 1 Image, 2*2 Stitchi | CrowdPose | AP Easy | 79.9 | BAPose (W32) |
| 1 Image, 2*2 Stitchi | CrowdPose | AP Hard | 61.3 | BAPose (W32) |
| 1 Image, 2*2 Stitchi | CrowdPose | AP Medium | 73.4 | BAPose (W32) |
| 1 Image, 2*2 Stitchi | CrowdPose | mAP @0.5:0.95 | 72.2 | BAPose (W32) |