TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Probabilistic and Geometric Depth: Detecting Objects in Pe...

Probabilistic and Geometric Depth: Detecting Objects in Perspective

Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

2021-07-29AttributeMonocular 3D Object DetectionDepth Estimationobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as a representative general setting among image-based approaches, provides a more economical solution than conventional settings relying on LiDARs but still yields unsatisfactory results. This paper first presents a systematic study on this problem. We observe that the current monocular 3D detection can be simplified as an instance depth estimation problem: The inaccurate instance depth blocks all the other 3D attribute predictions from improving the overall detection performance. Moreover, recent methods directly estimate the depth based on isolated instances or pixels while ignoring the geometric relations across different objects. To this end, we construct geometric relation graphs across predicted objects and use the graph to facilitate depth estimation. As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty. It provides an important indicator to identify confident predictions and further guide the depth propagation. Despite the simplicity of the basic idea, our method, PGD, obtains significant improvements on KITTI and nuScenes benchmarks, achieving 1st place out of all monocular vision-only methods while still maintaining real-time efficiency. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

Results

TaskDatasetMetricValueModel
Object DetectionKITTI Cars Hard valAP16.9PGD
Object DetectionnuScenesNDS0.45PGD
Object DetectionnuScenesmAP0.39PGD
Object DetectionKITTI Cars Moderate valAP18.34PGD
Object DetectionKITTI Cars Easy valAP24.35PGD
Object DetectionKITTI Cars ModerateAP Medium11.76PGD
3DKITTI Cars Hard valAP16.9PGD
3DnuScenesNDS0.45PGD
3DnuScenesmAP0.39PGD
3DKITTI Cars Moderate valAP18.34PGD
3DKITTI Cars Easy valAP24.35PGD
3DKITTI Cars ModerateAP Medium11.76PGD
3D Object DetectionKITTI Cars Hard valAP16.9PGD
3D Object DetectionnuScenesNDS0.45PGD
3D Object DetectionnuScenesmAP0.39PGD
3D Object DetectionKITTI Cars Moderate valAP18.34PGD
3D Object DetectionKITTI Cars Easy valAP24.35PGD
3D Object DetectionKITTI Cars ModerateAP Medium11.76PGD
2D ClassificationKITTI Cars Hard valAP16.9PGD
2D ClassificationnuScenesNDS0.45PGD
2D ClassificationnuScenesmAP0.39PGD
2D ClassificationKITTI Cars Moderate valAP18.34PGD
2D ClassificationKITTI Cars Easy valAP24.35PGD
2D ClassificationKITTI Cars ModerateAP Medium11.76PGD
2D Object DetectionKITTI Cars Hard valAP16.9PGD
2D Object DetectionnuScenesNDS0.45PGD
2D Object DetectionnuScenesmAP0.39PGD
2D Object DetectionKITTI Cars Moderate valAP18.34PGD
2D Object DetectionKITTI Cars Easy valAP24.35PGD
2D Object DetectionKITTI Cars ModerateAP Medium11.76PGD
16kKITTI Cars Hard valAP16.9PGD
16knuScenesNDS0.45PGD
16knuScenesmAP0.39PGD
16kKITTI Cars Moderate valAP18.34PGD
16kKITTI Cars Easy valAP24.35PGD
16kKITTI Cars ModerateAP Medium11.76PGD

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16