TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DETRPose: Real-time end-to-end transformer model for multi...

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

Sebastian Janampa, Marios Pattichis

2025-06-16QuantizationPose EstimationMulti-Person Pose Estimation2D Pose Estimation
PaperPDFCode(official)

Abstract

Multi-person pose estimation (MPPE) estimates keypoints for all individuals present in an image. MPPE is a fundamental task for several applications in computer vision and virtual reality. Unfortunately, there are currently no transformer-based models that can perform MPPE in real time. The paper presents a family of transformer-based models capable of performing multi-person 2D pose estimation in real-time. Our approach utilizes a modified decoder architecture and keypoint similarity metrics to generate both positive and negative queries, thereby enhancing the quality of the selected queries within the architecture. Compared to state-of-the-art models, our proposed models train much faster, using 5 to 10 times fewer epochs, with competitive inference times without requiring quantization libraries to speed up the model. Furthermore, our proposed models provide competitive results or outperform alternative models, often using significantly fewer parameters.

Results

TaskDatasetMetricValueModel
Pose EstimationCrowdPoseAP Easy81.3DETRPose-X
Pose EstimationCrowdPoseAP Hard68.1DETRPose-X
Pose EstimationCrowdPoseAP Medium75.7DETRPose-X
Pose EstimationCrowdPosemAP @0.5:0.9575.1DETRPose-X
Pose EstimationCrowdPoseAP Easy79.5DETRPose-L
Pose EstimationCrowdPoseAP Hard66.1DETRPose-L
Pose EstimationCrowdPoseAP Medium74DETRPose-L
Pose EstimationCrowdPosemAP @0.5:0.9573.3DETRPose-L
Pose EstimationCrowdPoseAP Easy78.6DETRPose-M
Pose EstimationCrowdPoseAP Hard64.5DETRPose-M
Pose EstimationCrowdPoseAP Medium72.6DETRPose-M
Pose EstimationCrowdPosemAP @0.5:0.9572DETRPose-M
Pose EstimationCrowdPoseAP Easy74.7DETRPose-S
Pose EstimationCrowdPoseAP Hard59.3DETRPose-S
Pose EstimationCrowdPoseAP Medium68.1DETRPose-S
Pose EstimationCrowdPosemAP @0.5:0.9567.4DETRPose-S
Pose EstimationCrowdPoseAP Easy65DETRPose-N
Pose EstimationCrowdPoseAP Hard466DETRPose-N
Pose EstimationCrowdPoseAP Medium566DETRPose-N
Pose EstimationCrowdPosemAP @0.5:0.9556DETRPose-N
3DCrowdPoseAP Easy81.3DETRPose-X
3DCrowdPoseAP Hard68.1DETRPose-X
3DCrowdPoseAP Medium75.7DETRPose-X
3DCrowdPosemAP @0.5:0.9575.1DETRPose-X
3DCrowdPoseAP Easy79.5DETRPose-L
3DCrowdPoseAP Hard66.1DETRPose-L
3DCrowdPoseAP Medium74DETRPose-L
3DCrowdPosemAP @0.5:0.9573.3DETRPose-L
3DCrowdPoseAP Easy78.6DETRPose-M
3DCrowdPoseAP Hard64.5DETRPose-M
3DCrowdPoseAP Medium72.6DETRPose-M
3DCrowdPosemAP @0.5:0.9572DETRPose-M
3DCrowdPoseAP Easy74.7DETRPose-S
3DCrowdPoseAP Hard59.3DETRPose-S
3DCrowdPoseAP Medium68.1DETRPose-S
3DCrowdPosemAP @0.5:0.9567.4DETRPose-S
3DCrowdPoseAP Easy65DETRPose-N
3DCrowdPoseAP Hard466DETRPose-N
3DCrowdPoseAP Medium566DETRPose-N
3DCrowdPosemAP @0.5:0.9556DETRPose-N
Multi-Person Pose EstimationCrowdPoseAP Easy81.3DETRPose-X
Multi-Person Pose EstimationCrowdPoseAP Hard68.1DETRPose-X
Multi-Person Pose EstimationCrowdPoseAP Medium75.7DETRPose-X
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9575.1DETRPose-X
Multi-Person Pose EstimationCrowdPoseAP Easy79.5DETRPose-L
Multi-Person Pose EstimationCrowdPoseAP Hard66.1DETRPose-L
Multi-Person Pose EstimationCrowdPoseAP Medium74DETRPose-L
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9573.3DETRPose-L
Multi-Person Pose EstimationCrowdPoseAP Easy78.6DETRPose-M
Multi-Person Pose EstimationCrowdPoseAP Hard64.5DETRPose-M
Multi-Person Pose EstimationCrowdPoseAP Medium72.6DETRPose-M
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9572DETRPose-M
Multi-Person Pose EstimationCrowdPoseAP Easy74.7DETRPose-S
Multi-Person Pose EstimationCrowdPoseAP Hard59.3DETRPose-S
Multi-Person Pose EstimationCrowdPoseAP Medium68.1DETRPose-S
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9567.4DETRPose-S
Multi-Person Pose EstimationCrowdPoseAP Easy65DETRPose-N
Multi-Person Pose EstimationCrowdPoseAP Hard466DETRPose-N
Multi-Person Pose EstimationCrowdPoseAP Medium566DETRPose-N
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9556DETRPose-N
1 Image, 2*2 StitchiCrowdPoseAP Easy81.3DETRPose-X
1 Image, 2*2 StitchiCrowdPoseAP Hard68.1DETRPose-X
1 Image, 2*2 StitchiCrowdPoseAP Medium75.7DETRPose-X
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9575.1DETRPose-X
1 Image, 2*2 StitchiCrowdPoseAP Easy79.5DETRPose-L
1 Image, 2*2 StitchiCrowdPoseAP Hard66.1DETRPose-L
1 Image, 2*2 StitchiCrowdPoseAP Medium74DETRPose-L
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9573.3DETRPose-L
1 Image, 2*2 StitchiCrowdPoseAP Easy78.6DETRPose-M
1 Image, 2*2 StitchiCrowdPoseAP Hard64.5DETRPose-M
1 Image, 2*2 StitchiCrowdPoseAP Medium72.6DETRPose-M
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9572DETRPose-M
1 Image, 2*2 StitchiCrowdPoseAP Easy74.7DETRPose-S
1 Image, 2*2 StitchiCrowdPoseAP Hard59.3DETRPose-S
1 Image, 2*2 StitchiCrowdPoseAP Medium68.1DETRPose-S
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9567.4DETRPose-S
1 Image, 2*2 StitchiCrowdPoseAP Easy65DETRPose-N
1 Image, 2*2 StitchiCrowdPoseAP Hard466DETRPose-N
1 Image, 2*2 StitchiCrowdPoseAP Medium566DETRPose-N
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9556DETRPose-N

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17