TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MonoDGP: Monocular 3D Object Detection with Decoupled-Quer...

MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors

Fanqi Pu, Yifan Wang, Jiru Deng, Wenming Yang

2024-10-25CVPR 2025 1Monocular 3D Object DetectionDepth PredictionDepth Estimationobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Perspective projection has been extensively utilized in monocular 3D object detection methods. It introduces geometric priors from 2D bounding boxes and 3D object dimensions to reduce the uncertainty of depth estimation. However, due to depth errors originating from the object's visual surface, the height of the bounding box often fails to represent the actual projected central height, which undermines the effectiveness of geometric depth. Direct prediction for the projected height unavoidably results in a loss of 2D priors, while multi-depth prediction with complex branches does not fully leverage geometric depth. This paper presents a Transformer-based monocular 3D object detection method called MonoDGP, which adopts perspective-invariant geometry errors to modify the projection formula. We also try to systematically discuss and explain the mechanisms and efficacy behind geometry errors, which serve as a simple but effective alternative to multi-depth prediction. Additionally, MonoDGP decouples the depth-guided decoder and constructs a 2D decoder only dependent on visual features, providing 2D priors and initializing object queries without the disturbance of 3D detection. To further optimize and fine-tune input tokens of the transformer decoder, we also introduce a Region Segment Head (RSH) that generates enhanced features and segment embeddings. Our monocular method demonstrates state-of-the-art performance on the KITTI benchmark without extra data. Code is available at https://github.com/PuFanqi23/MonoDGP.

Results

TaskDatasetMetricValueModel
Object DetectionKITTI Cars EasyAP Easy26.35MonoDGP
Object DetectionKITTI Cars ModerateAP Medium18.72MonoDGP
Object DetectionKITTI Cars HardAP Hard15.97MonoDGP
3DKITTI Cars EasyAP Easy26.35MonoDGP
3DKITTI Cars ModerateAP Medium18.72MonoDGP
3DKITTI Cars HardAP Hard15.97MonoDGP
3D Object DetectionKITTI Cars EasyAP Easy26.35MonoDGP
3D Object DetectionKITTI Cars ModerateAP Medium18.72MonoDGP
3D Object DetectionKITTI Cars HardAP Hard15.97MonoDGP
2D ClassificationKITTI Cars EasyAP Easy26.35MonoDGP
2D ClassificationKITTI Cars ModerateAP Medium18.72MonoDGP
2D ClassificationKITTI Cars HardAP Hard15.97MonoDGP
2D Object DetectionKITTI Cars EasyAP Easy26.35MonoDGP
2D Object DetectionKITTI Cars ModerateAP Medium18.72MonoDGP
2D Object DetectionKITTI Cars HardAP Hard15.97MonoDGP
16kKITTI Cars EasyAP Easy26.35MonoDGP
16kKITTI Cars ModerateAP Medium18.72MonoDGP
16kKITTI Cars HardAP Hard15.97MonoDGP

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16