TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Monocular, One-stage, Regression of Multiple 3D People

Monocular, One-stage, Regression of Multiple 3D People

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei

2020-08-27ICCV 2021 103D Human Pose EstimationregressionMulti-Person Pose Estimation3D Depth Estimation3D Multi-Person Pose Estimation3D Multi-Person Mesh Recovery
PaperPDFCode(official)Code

Abstract

This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.

Results

TaskDatasetMetricValueModel
Depth EstimationRelative HumanPCDR54.84ROMP
Depth EstimationRelative HumanPCDR-Adult55.34ROMP
Depth EstimationRelative HumanPCDR-Baby30.08ROMP
Depth EstimationRelative HumanPCDR-Kid48.41ROMP
Depth EstimationRelative HumanPCDR-Teen51.12ROMP
Depth EstimationRelative HumanmPCDK0.866ROMP
3D Human Pose EstimationEMDBAverage MPJAE (deg)26.5975ROMP
3D Human Pose EstimationEMDBAverage MPJAE-PA (deg)23.9901ROMP
3D Human Pose EstimationEMDBAverage MPJPE (mm)112.652ROMP
3D Human Pose EstimationEMDBAverage MPJPE-PA (mm)75.1869ROMP
3D Human Pose EstimationEMDBAverage MVE (mm)134.863ROMP
3D Human Pose EstimationEMDBAverage MVE-PA (mm)90.648ROMP
3D Human Pose EstimationEMDBJitter (10m/s^3)71.2556ROMP
3D Human Pose EstimationPanopticAverage MPJPE (mm)127.6ROMP (ResNet-50)
3D Human Pose Estimation3D Poses in the Wild ChallengeMPJPE81.76ROMP
3D Human Pose EstimationRelative HumanPCDR68.27ROMP
Pose EstimationEMDBAverage MPJAE (deg)26.5975ROMP
Pose EstimationEMDBAverage MPJAE-PA (deg)23.9901ROMP
Pose EstimationEMDBAverage MPJPE (mm)112.652ROMP
Pose EstimationEMDBAverage MPJPE-PA (mm)75.1869ROMP
Pose EstimationEMDBAverage MVE (mm)134.863ROMP
Pose EstimationEMDBAverage MVE-PA (mm)90.648ROMP
Pose EstimationEMDBJitter (10m/s^3)71.2556ROMP
Pose EstimationPanopticAverage MPJPE (mm)127.6ROMP (ResNet-50)
Pose Estimation3D Poses in the Wild ChallengeMPJPE81.76ROMP
Pose EstimationRelative HumanPCDR68.27ROMP
Pose EstimationCrowdPosemAP @0.5:0.9558.6ROMP+CAR
Pose EstimationCrowdPosemAP @0.5:0.9555.6ROMP
3DEMDBAverage MPJAE (deg)26.5975ROMP
3DEMDBAverage MPJAE-PA (deg)23.9901ROMP
3DEMDBAverage MPJPE (mm)112.652ROMP
3DEMDBAverage MPJPE-PA (mm)75.1869ROMP
3DEMDBAverage MVE (mm)134.863ROMP
3DEMDBAverage MVE-PA (mm)90.648ROMP
3DEMDBJitter (10m/s^3)71.2556ROMP
3DPanopticAverage MPJPE (mm)127.6ROMP (ResNet-50)
3D3D Poses in the Wild ChallengeMPJPE81.76ROMP
3DRelative HumanPCDR68.27ROMP
3DCrowdPosemAP @0.5:0.9558.6ROMP+CAR
3DCrowdPosemAP @0.5:0.9555.6ROMP
3DRelative HumanPCDR54.84ROMP
3DRelative HumanPCDR-Adult55.34ROMP
3DRelative HumanPCDR-Baby30.08ROMP
3DRelative HumanPCDR-Kid48.41ROMP
3DRelative HumanPCDR-Teen51.12ROMP
3DRelative HumanmPCDK0.866ROMP
3D Multi-Person Pose EstimationRelative HumanPCDR68.27ROMP
3D Depth EstimationRelative HumanPCDR54.84ROMP
3D Depth EstimationRelative HumanPCDR-Adult55.34ROMP
3D Depth EstimationRelative HumanPCDR-Baby30.08ROMP
3D Depth EstimationRelative HumanPCDR-Kid48.41ROMP
3D Depth EstimationRelative HumanPCDR-Teen51.12ROMP
3D Depth EstimationRelative HumanmPCDK0.866ROMP
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9558.6ROMP+CAR
Multi-Person Pose EstimationCrowdPosemAP @0.5:0.9555.6ROMP
1 Image, 2*2 StitchiEMDBAverage MPJAE (deg)26.5975ROMP
1 Image, 2*2 StitchiEMDBAverage MPJAE-PA (deg)23.9901ROMP
1 Image, 2*2 StitchiEMDBAverage MPJPE (mm)112.652ROMP
1 Image, 2*2 StitchiEMDBAverage MPJPE-PA (mm)75.1869ROMP
1 Image, 2*2 StitchiEMDBAverage MVE (mm)134.863ROMP
1 Image, 2*2 StitchiEMDBAverage MVE-PA (mm)90.648ROMP
1 Image, 2*2 StitchiEMDBJitter (10m/s^3)71.2556ROMP
1 Image, 2*2 StitchiPanopticAverage MPJPE (mm)127.6ROMP (ResNet-50)
1 Image, 2*2 Stitchi3D Poses in the Wild ChallengeMPJPE81.76ROMP
1 Image, 2*2 StitchiRelative HumanPCDR68.27ROMP
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9558.6ROMP+CAR
1 Image, 2*2 StitchiCrowdPosemAP @0.5:0.9555.6ROMP

Related Papers

Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts2025-07-16Imbalanced Regression Pipeline Recommendation2025-07-16Second-Order Bounds for [0,1]-Valued Regression via Betting Loss2025-07-16SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring2025-07-16Sparse Regression Codes exploit Multi-User Diversity without CSI2025-07-15Bradley-Terry and Multi-Objective Reward Modeling Are Complementary2025-07-10Active Learning for Manifold Gaussian Process Regression2025-06-26