TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Poseur: Direct Human Pose Regression with Transformers

Poseur: Direct Human Pose Regression with Transformers

Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel

2022-01-19regression2D Human Pose EstimationPose EstimationKeypoint Detection
PaperPDFCode(official)

Abstract

We propose a direct, regression-based approach to 2D human pose estimation from single images. We formulate the problem as a sequence prediction task, which we solve using a Transformer network. This network directly learns a regression mapping from images to the keypoint coordinates, without resorting to intermediate representations such as heatmaps. This approach avoids much of the complexity associated with heatmap-based approaches. To overcome the feature misalignment issues of previous regression-based methods, we propose an attention mechanism that adaptively attends to the features that are most relevant to the target keypoints, considerably improving the accuracy. Importantly, our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Experiments on MS-COCO and MPII, two predominant pose-estimation datasets, demonstrate that our method significantly improves upon the state-of-the-art in regression-based pose estimation. More notably, ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.

Results

TaskDatasetMetricValueModel
Pose EstimationCOCO (Common Objects in Context)Test AP78.3Poseur(384x288)
Pose EstimationCOCO (Common Objects in Context)Validation AP79.6Poseur(384x288)
3DCOCO (Common Objects in Context)Test AP78.3Poseur(384x288)
3DCOCO (Common Objects in Context)Validation AP79.6Poseur(384x288)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Test AP78.3Poseur(384x288)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Validation AP79.6Poseur(384x288)

Related Papers

Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts2025-07-16Imbalanced Regression Pipeline Recommendation2025-07-16