TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DPIT: Dual-Pipeline Integrated Transformer for Human Pose ...

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

Shuaitao Zhao, Kun Liu, Yuhang Huang, Qian Bao, Dan Zeng, Wu Liu

2022-09-02Human DetectionPose Estimation
PaperPDF

Abstract

Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happens. On the other hand, existing bottom-up methods consider all people at the same time and capture the global knowledge of the entire image. However, they are less accurate than the top-down methods due to the scale variation. To address these problems, we propose a novel Dual-Pipeline Integrated Transformer (DPIT) by integrating top-down and bottom-up pipelines to explore the visual clues of different receptive fields and achieve their complementarity. Specifically, DPIT consists of two branches, the bottom-up branch deals with the whole image to capture the global visual information, while the top-down branch extracts the feature representation of local vision from the single-human bounding box. Then, the extracted feature representations from bottom-up and top-down branches are fed into the transformer encoder to fuse the global and local knowledge interactively. Moreover, we define the keypoint queries to explore both full-scene and single-human posture visual clues to realize the mutual complementarity of the two pipelines. To the best of our knowledge, this is one of the first works to integrate the bottom-up and top-down pipelines with transformers for human pose estimation. Extensive experiments on COCO and MPII datasets demonstrate that our DPIT achieves comparable performance to the state-of-the-art methods.

Results

TaskDatasetMetricValueModel
Pose EstimationCOCO test-devAP74.6DPIT-L
Pose EstimationCOCO test-devAP5091.9DPIT-L
Pose EstimationCOCO test-devAP7582.1DPIT-L
Pose EstimationCOCO test-devAPL80.6DPIT-L
Pose EstimationCOCO test-devAPM71.3DPIT-L
Pose EstimationCOCO test-devAR79.9DPIT-L
3DCOCO test-devAP74.6DPIT-L
3DCOCO test-devAP5091.9DPIT-L
3DCOCO test-devAP7582.1DPIT-L
3DCOCO test-devAPL80.6DPIT-L
3DCOCO test-devAPM71.3DPIT-L
3DCOCO test-devAR79.9DPIT-L
1 Image, 2*2 StitchiCOCO test-devAP74.6DPIT-L
1 Image, 2*2 StitchiCOCO test-devAP5091.9DPIT-L
1 Image, 2*2 StitchiCOCO test-devAP7582.1DPIT-L
1 Image, 2*2 StitchiCOCO test-devAPL80.6DPIT-L
1 Image, 2*2 StitchiCOCO test-devAPM71.3DPIT-L
1 Image, 2*2 StitchiCOCO test-devAR79.9DPIT-L

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16