TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-Instance Pose Networks: Rethinking Top-Down Pose Est...

Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation

Rawal Khirodkar, Visesh Chari, Amit Agrawal, Ambrish Tyagi

2021-01-27ICCV 2021 102D Human Pose EstimationPose EstimationMulti-Person Pose EstimationKeypoint Detection
PaperPDFCode(official)

Abstract

A key assumption of top-down human pose estimation approaches is their expectation of having a single person/instance present in the input bounding box. This often leads to failures in crowded scenes with occlusions. We propose a novel solution to overcome the limitations of this fundamental assumption. Our Multi-Instance Pose Network (MIPNet) allows for predicting multiple 2D pose instances within a given bounding box. We introduce a Multi-Instance Modulation Block (MIMB) that can adaptively modulate channel-wise feature responses for each instance and is parameter efficient. We demonstrate the efficacy of our approach by evaluating on COCO, CrowdPose, and OCHuman datasets. Specifically, we achieve 70.0 AP on CrowdPose and 42.5 AP on OCHuman test sets, a significant improvement of 2.4 AP and 6.5 AP over the prior art, respectively. When using ground truth bounding boxes for inference, MIPNet achieves an improvement of 0.7 AP on COCO, 0.9 AP on CrowdPose, and 9.1 AP on OCHuman validation sets compared to HRNet. Interestingly, when fewer, high confidence bounding boxes are used, HRNet's performance degrades (by 5 AP) on OCHuman, whereas MIPNet maintains a relatively stable performance (drop of 1 AP) for the same inputs.

Results

TaskDatasetMetricValueModel
Pose EstimationOCHumanTest AP42.5MIPNet (HRNet-W48)
Pose EstimationOCHumanValidation AP42MIPNet (HRNet-W48)
Pose EstimationOCHumanTest AP37.2HRNet-W48
Pose EstimationOCHumanValidation AP37.8HRNet-W48
Pose EstimationCOCO test-devAP75.7MIPNet
Pose EstimationCOCO test-devAP5092.4MIPNet
Pose EstimationCOCO test-devAP7583.3MIPNet
Pose EstimationCOCO test-devAPL81.2MIPNet
Pose EstimationCOCO test-devAPM71.4MIPNet
Pose EstimationCOCO test-devAR80.5MIPNet
Pose EstimationCrowdPoseAP70MIPNet (HRNet-W48)
Pose EstimationCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
Pose EstimationCrowdPoseAPM71.1MIPNet (HRNet-W48)
Pose EstimationCOCO (Common Objects in Context)Test AP75.7MIPNet(384x288)
Pose EstimationCOCO (Common Objects in Context)Validation AP76.3MIPNet(384x288)
Pose EstimationOCHumanTest AP42.5MIPNet (HRNet-W48)
Pose EstimationOCHumanValidation AP42MIPNet (HRNet-W48)
Pose EstimationOCHumanTest AP37.2HRNet-W48
Pose EstimationOCHumanValidation AP37.8HRNet-W48
Pose EstimationCrowdPoseAP Easy78.1MIPNet (HRNet-W48)
Pose EstimationCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
Pose EstimationCrowdPoseAP Medium71.1MIPNet (HRNet-W48)
Pose EstimationCrowdPosemAP @0.5:0.9570MIPNet (HRNet-W48)
Pose EstimationOCHumanAP5089.7MIPNet (gt-bb)
Pose EstimationOCHumanAP7580.1MIPNet (gt-bb)
Pose EstimationOCHumanValidation AP74.1MIPNet (gt-bb)
3DOCHumanTest AP42.5MIPNet (HRNet-W48)
3DOCHumanValidation AP42MIPNet (HRNet-W48)
3DOCHumanTest AP37.2HRNet-W48
3DOCHumanValidation AP37.8HRNet-W48
3DCOCO test-devAP75.7MIPNet
3DCOCO test-devAP5092.4MIPNet
3DCOCO test-devAP7583.3MIPNet
3DCOCO test-devAPL81.2MIPNet
3DCOCO test-devAPM71.4MIPNet
3DCOCO test-devAR80.5MIPNet
3DCrowdPoseAP70MIPNet (HRNet-W48)
3DCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
3DCrowdPoseAPM71.1MIPNet (HRNet-W48)
3DCOCO (Common Objects in Context)Test AP75.7MIPNet(384x288)
3DCOCO (Common Objects in Context)Validation AP76.3MIPNet(384x288)
3DOCHumanTest AP42.5MIPNet (HRNet-W48)
3DOCHumanValidation AP42MIPNet (HRNet-W48)
3DOCHumanTest AP37.2HRNet-W48
3DOCHumanValidation AP37.8HRNet-W48
3DCrowdPoseAP Easy78.1MIPNet (HRNet-W48)
3DCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
3DCrowdPoseAP Medium71.1MIPNet (HRNet-W48)
3DCrowdPosemAP @0.5:0.9570MIPNet (HRNet-W48)
3DOCHumanAP5089.7MIPNet (gt-bb)
3DOCHumanAP7580.1MIPNet (gt-bb)
3DOCHumanValidation AP74.1MIPNet (gt-bb)
2D Human Pose EstimationOCHumanTest AP42.5MIPNet (HRNet-W48)
2D Human Pose EstimationOCHumanValidation AP42MIPNet (HRNet-W48)
2D Human Pose EstimationOCHumanTest AP37.2HRNet-W48
2D Human Pose EstimationOCHumanValidation AP37.8HRNet-W48
Multi-Person Pose EstimationCrowdPoseAP Easy78.1MIPNet (HRNet-W48)
Multi-Person Pose EstimationCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
Multi-Person Pose EstimationCrowdPoseAP Medium71.1MIPNet (HRNet-W48)
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9570MIPNet (HRNet-W48)
Multi-Person Pose EstimationOCHumanAP5089.7MIPNet (gt-bb)
Multi-Person Pose EstimationOCHumanAP7580.1MIPNet (gt-bb)
Multi-Person Pose EstimationOCHumanValidation AP74.1MIPNet (gt-bb)
1 Image, 2*2 StitchiOCHumanTest AP42.5MIPNet (HRNet-W48)
1 Image, 2*2 StitchiOCHumanValidation AP42MIPNet (HRNet-W48)
1 Image, 2*2 StitchiOCHumanTest AP37.2HRNet-W48
1 Image, 2*2 StitchiOCHumanValidation AP37.8HRNet-W48
1 Image, 2*2 StitchiCOCO test-devAP75.7MIPNet
1 Image, 2*2 StitchiCOCO test-devAP5092.4MIPNet
1 Image, 2*2 StitchiCOCO test-devAP7583.3MIPNet
1 Image, 2*2 StitchiCOCO test-devAPL81.2MIPNet
1 Image, 2*2 StitchiCOCO test-devAPM71.4MIPNet
1 Image, 2*2 StitchiCOCO test-devAR80.5MIPNet
1 Image, 2*2 StitchiCrowdPoseAP70MIPNet (HRNet-W48)
1 Image, 2*2 StitchiCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
1 Image, 2*2 StitchiCrowdPoseAPM71.1MIPNet (HRNet-W48)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Test AP75.7MIPNet(384x288)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Validation AP76.3MIPNet(384x288)
1 Image, 2*2 StitchiOCHumanTest AP42.5MIPNet (HRNet-W48)
1 Image, 2*2 StitchiOCHumanValidation AP42MIPNet (HRNet-W48)
1 Image, 2*2 StitchiOCHumanTest AP37.2HRNet-W48
1 Image, 2*2 StitchiOCHumanValidation AP37.8HRNet-W48
1 Image, 2*2 StitchiCrowdPoseAP Easy78.1MIPNet (HRNet-W48)
1 Image, 2*2 StitchiCrowdPoseAP Hard59.4MIPNet (HRNet-W48)
1 Image, 2*2 StitchiCrowdPoseAP Medium71.1MIPNet (HRNet-W48)
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9570MIPNet (HRNet-W48)
1 Image, 2*2 StitchiOCHumanAP5089.7MIPNet (gt-bb)
1 Image, 2*2 StitchiOCHumanAP7580.1MIPNet (gt-bb)
1 Image, 2*2 StitchiOCHumanValidation AP74.1MIPNet (gt-bb)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16