TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/XFormer: Fast and Accurate Monocular 3D Body Capture

XFormer: Fast and Accurate Monocular 3D Body Capture

Lihui Qian, Xintong Han, Faqiang Wang, Hongyu Liu, Haoye Dong, Zhiwen Li, Huawei Wei, Zhe Lin, Cheng-Bin Jin

2023-05-183D Human Pose Estimation
PaperPDF

Abstract

We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: a keypoint branch that estimates 3D human mesh vertices given 2D keypoints, and an image branch that makes predictions directly from the RGB image features. At the core of our method is a cross-modal transformer block that allows information to flow across these two branches by modeling the attention between 2D keypoint coordinates and image spatial features. Our architecture is smartly designed, which enables us to train on various types of datasets including images with 2D/3D annotations, images with 3D pseudo labels, and motion capture datasets that do not have associated images. This effectively improves the accuracy and generalization ability of our system. Built on a lightweight backbone (MobileNetV3), our method runs blazing fast (over 30fps on a single CPU core) and still yields competitive accuracy. Furthermore, with an HRNet backbone, XFormer delivers state-of-the-art performance on Huamn3.6 and 3DPW datasets.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPMPJPE109.8XFormer (HRNet)
3D Human Pose EstimationMPI-INF-3DHPPA-MPJPE64.5XFormer (HRNet)
3D Human Pose Estimation3DPWMPJPE75XFormer (HRNet)
3D Human Pose Estimation3DPWMPVPE87.1XFormer (HRNet)
3D Human Pose Estimation3DPWPA-MPJPE45.7XFormer (HRNet)
Pose EstimationMPI-INF-3DHPMPJPE109.8XFormer (HRNet)
Pose EstimationMPI-INF-3DHPPA-MPJPE64.5XFormer (HRNet)
Pose Estimation3DPWMPJPE75XFormer (HRNet)
Pose Estimation3DPWMPVPE87.1XFormer (HRNet)
Pose Estimation3DPWPA-MPJPE45.7XFormer (HRNet)
3DMPI-INF-3DHPMPJPE109.8XFormer (HRNet)
3DMPI-INF-3DHPPA-MPJPE64.5XFormer (HRNet)
3D3DPWMPJPE75XFormer (HRNet)
3D3DPWMPVPE87.1XFormer (HRNet)
3D3DPWPA-MPJPE45.7XFormer (HRNet)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE109.8XFormer (HRNet)
1 Image, 2*2 StitchiMPI-INF-3DHPPA-MPJPE64.5XFormer (HRNet)
1 Image, 2*2 Stitchi3DPWMPJPE75XFormer (HRNet)
1 Image, 2*2 Stitchi3DPWMPVPE87.1XFormer (HRNet)
1 Image, 2*2 Stitchi3DPWPA-MPJPE45.7XFormer (HRNet)

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction2025-05-20PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation2025-05-16HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation2025-05-07Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation2025-05-04