TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Exemplar Fine-Tuning for 3D Human Model Fitting Towards In...

Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

Hanbyul Joo, Natalia Neverova, Andrea Vedaldi

2020-04-073D Human Pose EstimationPose Estimation3D Pose Estimation
PaperPDFCode(official)

Abstract

Differently from 2D image datasets such as COCO, large-scale human datasets with 3D ground-truth annotations are very difficult to obtain in the wild. In this paper, we address this problem by augmenting existing 2D datasets with high-quality 3D pose fits. Remarkably, the resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks such as 3DPW. Additionally, training on our augmented data is straightforward as it does not require to mix multiple and incompatible 2D and 3D datasets or to use complicated network architectures and training procedures. This simplified pipeline affords additional improvements, including injecting extreme crop augmentations to better reconstruct highly truncated people, and incorporating auxiliary inputs to improve 3D pose estimation accuracy. It also reduces the dependency on 3D datasets such as H36M that have restrictive licenses. We also use our method to introduce new benchmarks for the study of real-world challenges such as occlusions, truncations, and rare body poses. In order to obtain such high quality 3D pseudo-annotations, inspired by progress in internal learning, we introduce Exemplar Fine-Tuning (EFT). EFT combines the re-projection accuracy of fitting methods like SMPLify with a 3D pose prior implicitly captured by a pre-trained 3D pose regressor network. We show that EFT produces 3D annotations that result in better downstream performance and are qualitatively preferable in an extensive human-based assessment.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPPA-MPJPE67.5EFT
3D Human Pose Estimation3DPWPA-MPJPE51.6EFT
Pose EstimationMPI-INF-3DHPPA-MPJPE67.5EFT
Pose Estimation3DPWPA-MPJPE51.6EFT
3DMPI-INF-3DHPPA-MPJPE67.5EFT
3D3DPWPA-MPJPE51.6EFT
1 Image, 2*2 StitchiMPI-INF-3DHPPA-MPJPE67.5EFT
1 Image, 2*2 Stitchi3DPWPA-MPJPE51.6EFT

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16