TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rethinking pose estimation in crowds: overcoming the detec...

Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity

Mu Zhou, Lucas Stoffl, Mackenzie Weygandt Mathis, Alexander Mathis

2023-06-13Pose EstimationMulti-Person Pose EstimationAnimal Pose Estimation
PaperPDFCode(official)

Abstract

Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms. Current pipelines either use an object detector together with a pose estimator (top-down approach), or localize all body parts first and then link them to predict the pose of individuals (bottom-up). Yet, when individuals closely interact, top-down methods are ill-defined due to overlapping individuals, and bottom-up methods often falsely infer connections to distant bodyparts. Thus, we propose a novel pipeline called bottom-up conditioned top-down pose estimation (BUCTD) that combines the strengths of bottom-up and top-down methods. Specifically, we propose to use a bottom-up model as the detector, which in addition to an estimated bounding box provides a pose proposal that is fed as condition to an attention-based top-down model. We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks. On CrowdPose and OCHuman, we outperform previous state-of-the-art models by a significant margin. We achieve 78.5 AP on CrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over the prior art, respectively. Furthermore, we show that our method strongly improves the performance on multi-animal benchmarks involving fish and monkeys. The code is available at https://github.com/amathislab/BUCTD

Results

TaskDatasetMetricValueModel
Pose EstimationOCHumanTest AP47.2BUCTD (CID-W32)
Pose EstimationOCHumanValidation AP47.7BUCTD (CID-W32)
Pose EstimationCOCO (Common Objects in Context)AP77.8BUCTD (PETR, with generative sampling)
Pose EstimationCOCO (Common Objects in Context)APL83.7BUCTD (PETR, with generative sampling)
Pose EstimationCOCO (Common Objects in Context)APM74.2BUCTD (PETR, with generative sampling)
Pose EstimationCrowdPoseAP78.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPoseAP76.7BUCTD-W48 (w/cond. input from PETR)
Pose EstimationCrowdPoseAP72.9BUCTD-W48
Pose EstimationCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationCrowdPosemAP @0.5:0.9578.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Pose EstimationFish-100mAP89.1HRNet-W48 + Faster R-CNN
Pose EstimationFish-100mAP88.7BUCTD-preNet-W48 (DLCRNet)
Pose EstimationFish-100mAP88BUCTD-preNet-W48 (CID-W32)
Pose EstimationMarmoset-8KmAP93.3BUCTD-preNet-W48 (CID-W32)
Pose EstimationMarmoset-8KmAP92.5CID-W32
Pose EstimationMarmoset-8KmAP91.6BUCTD-CoAM-W48 (DLCRNet)
Pose EstimationTriMouse-161mAP99.1BUCTD-CoAM-W48 (DLCRNet)
Pose EstimationTriMouse-161mAP95.8DLCRNet
Pose EstimationTriMouse-161mAP86.8CID-W32
3DOCHumanTest AP47.2BUCTD (CID-W32)
3DOCHumanValidation AP47.7BUCTD (CID-W32)
3DCOCO (Common Objects in Context)AP77.8BUCTD (PETR, with generative sampling)
3DCOCO (Common Objects in Context)APL83.7BUCTD (PETR, with generative sampling)
3DCOCO (Common Objects in Context)APM74.2BUCTD (PETR, with generative sampling)
3DCrowdPoseAP78.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPoseAP76.7BUCTD-W48 (w/cond. input from PETR)
3DCrowdPoseAP72.9BUCTD-W48
3DCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DCrowdPosemAP @0.5:0.9578.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
3DFish-100mAP89.1HRNet-W48 + Faster R-CNN
3DFish-100mAP88.7BUCTD-preNet-W48 (DLCRNet)
3DFish-100mAP88BUCTD-preNet-W48 (CID-W32)
3DMarmoset-8KmAP93.3BUCTD-preNet-W48 (CID-W32)
3DMarmoset-8KmAP92.5CID-W32
3DMarmoset-8KmAP91.6BUCTD-CoAM-W48 (DLCRNet)
3DTriMouse-161mAP99.1BUCTD-CoAM-W48 (DLCRNet)
3DTriMouse-161mAP95.8DLCRNet
3DTriMouse-161mAP86.8CID-W32
Animal Pose EstimationFish-100mAP89.1HRNet-W48 + Faster R-CNN
Animal Pose EstimationFish-100mAP88.7BUCTD-preNet-W48 (DLCRNet)
Animal Pose EstimationFish-100mAP88BUCTD-preNet-W48 (CID-W32)
Animal Pose EstimationMarmoset-8KmAP93.3BUCTD-preNet-W48 (CID-W32)
Animal Pose EstimationMarmoset-8KmAP92.5CID-W32
Animal Pose EstimationMarmoset-8KmAP91.6BUCTD-CoAM-W48 (DLCRNet)
Animal Pose EstimationTriMouse-161mAP99.1BUCTD-CoAM-W48 (DLCRNet)
Animal Pose EstimationTriMouse-161mAP95.8DLCRNet
Animal Pose EstimationTriMouse-161mAP86.8CID-W32
Multi-Person Pose EstimationCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Multi-Person Pose EstimationCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Multi-Person Pose EstimationCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9578.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiOCHumanTest AP47.2BUCTD (CID-W32)
1 Image, 2*2 StitchiOCHumanValidation AP47.7BUCTD (CID-W32)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)AP77.8BUCTD (PETR, with generative sampling)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)APL83.7BUCTD (PETR, with generative sampling)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)APM74.2BUCTD (PETR, with generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP78.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP76.7BUCTD-W48 (w/cond. input from PETR)
1 Image, 2*2 StitchiCrowdPoseAP72.9BUCTD-W48
1 Image, 2*2 StitchiCrowdPoseAP Easy83.9BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP Hard72.3BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPoseAP Medium79BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9578.5BUCTD-W48 (w/cond. input from PETR, and generative sampling)
1 Image, 2*2 StitchiFish-100mAP89.1HRNet-W48 + Faster R-CNN
1 Image, 2*2 StitchiFish-100mAP88.7BUCTD-preNet-W48 (DLCRNet)
1 Image, 2*2 StitchiFish-100mAP88BUCTD-preNet-W48 (CID-W32)
1 Image, 2*2 StitchiMarmoset-8KmAP93.3BUCTD-preNet-W48 (CID-W32)
1 Image, 2*2 StitchiMarmoset-8KmAP92.5CID-W32
1 Image, 2*2 StitchiMarmoset-8KmAP91.6BUCTD-CoAM-W48 (DLCRNet)
1 Image, 2*2 StitchiTriMouse-161mAP99.1BUCTD-CoAM-W48 (DLCRNet)
1 Image, 2*2 StitchiTriMouse-161mAP95.8DLCRNet
1 Image, 2*2 StitchiTriMouse-161mAP86.8CID-W32

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16