TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning to Estimate Robust 3D Human Mesh from In-the-Wild...

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes

Hongsuk Choi, Gyeongsik Moon, JoonKyu Park, Kyoung Mu Lee

2021-04-15CVPR 2022 13D Human Pose Estimation2D Human Pose EstimationPose Estimation3D Multi-Person Human Pose Estimation3D Multi-Person Pose Estimation
PaperPDFCode(official)

Abstract

We consider the problem of recovering a single person's 3D human mesh from in-the-wild crowded scenes. While much progress has been in 3D human mesh estimation, existing methods struggle when test input has crowded scenes. The first reason for the failure is a domain gap between training and testing data. A motion capture dataset, which provides accurate 3D labels for training, lacks crowd data and impedes a network from learning crowded scene-robust image features of a target person. The second reason is a feature processing that spatially averages the feature map of a localized bounding box containing multiple people. Averaging the whole feature map makes a target person's feature indistinguishable from others. We present 3DCrowdNet that firstly explicitly targets in-the-wild crowded scenes and estimates a robust 3D human mesh by addressing the above issues. First, we leverage 2D human pose estimation that does not require a motion capture dataset with 3D labels for training and does not suffer from the domain gap. Second, we propose a joint-based regressor that distinguishes a target person's feature from others. Our joint-based regressor preserves the spatial activation of a target by sampling features from the target's joint locations and regresses human model parameters. As a result, 3DCrowdNet learns target-focused features and effectively excludes the irrelevant features of nearby persons. We conduct experiments on various benchmarks and prove the robustness of 3DCrowdNet to the in-the-wild crowded scenes both quantitatively and qualitatively. The code is available at https://github.com/hongsukchoi/3DCrowdNet_RELEASE.

Results

TaskDatasetMetricValueModel
3D Human Pose Estimation3DPWMPJPE85.83DCrowdNet
3D Human Pose Estimation3DPWMPVPE108.53DCrowdNet
3D Human Pose Estimation3DPWPA-MPJPE55.83DCrowdNet
3D Human Pose EstimationMuPoTS-3D3DPCK72.73DCrowdNet (HigherHRNet)
Pose Estimation3DPWMPJPE85.83DCrowdNet
Pose Estimation3DPWMPVPE108.53DCrowdNet
Pose Estimation3DPWPA-MPJPE55.83DCrowdNet
Pose EstimationMuPoTS-3D3DPCK72.73DCrowdNet (HigherHRNet)
3D3DPWMPJPE85.83DCrowdNet
3D3DPWMPVPE108.53DCrowdNet
3D3DPWPA-MPJPE55.83DCrowdNet
3DMuPoTS-3D3DPCK72.73DCrowdNet (HigherHRNet)
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK72.73DCrowdNet (HigherHRNet)
1 Image, 2*2 Stitchi3DPWMPJPE85.83DCrowdNet
1 Image, 2*2 Stitchi3DPWMPVPE108.53DCrowdNet
1 Image, 2*2 Stitchi3DPWPA-MPJPE55.83DCrowdNet
1 Image, 2*2 StitchiMuPoTS-3D3DPCK72.73DCrowdNet (HigherHRNet)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16