TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Geometry-Guided Depth via Projective Modeling for...

Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection

Yinmin Zhang, Xinzhu Ma, Shuai Yi, Jun Hou, Zhihui Wang, Wanli Ouyang, Dan Xu

2021-07-29Representation LearningMonocular 3D Object DetectionAutonomous DrivingDepth Estimationobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

As a crucial task of autonomous driving, 3D object detection has made great progress in recent years. However, monocular 3D object detection remains a challenging problem due to the unsatisfactory performance in depth estimation. Most existing monocular methods typically directly regress the scene depth while ignoring important relationships between the depth and various geometric elements (e.g. bounding box sizes, 3D object dimensions, and object poses). In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection. Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised. We further implement and embed the proposed formula to enable geometry-aware deep representation learning, allowing effective 2D and 3D interactions for boosting the depth estimation. Moreover, we provide a strong baseline through addressing substantial misalignment between 2D annotation and projected boxes to ensure robust learning with the proposed geometric formula. Experiments on the KITTI dataset show that our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting. The model and code will be released at https://github.com/YinminZhang/MonoGeo.

Results

TaskDatasetMetricValueModel
Object DetectionKITTI Cars ModerateAP Medium13.81MonoGeo
3DKITTI Cars ModerateAP Medium13.81MonoGeo
3D Object DetectionKITTI Cars ModerateAP Medium13.81MonoGeo
2D ClassificationKITTI Cars ModerateAP Medium13.81MonoGeo
2D Object DetectionKITTI Cars ModerateAP Medium13.81MonoGeo
16kKITTI Cars ModerateAP Medium13.81MonoGeo

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17