TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Geometry Uncertainty Projection Network for Monocular 3D O...

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang

2021-07-29ICCV 2021 103D Object Detection From Monocular ImagesMonocular 3D Object DetectionDepth Estimationobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Geometry Projection is a powerful depth estimation method in monocular 3D object detection. It estimates depth dependent on heights, which introduces mathematical priors into the deep model. But projection process also introduces the error amplification problem, in which the error of the estimated height will be amplified and reflected greatly at the output depth. This property leads to uncontrollable depth inferences and also damages the training efficiency. In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages. Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth, which not only provides high reliable confidence for each depth but also benefits depth learning. Furthermore, at the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification. This learning algorithm monitors the learning situation of each task by a proposed indicator and adaptively assigns the proper loss weights for different tasks according to their pre-tasks situation. Based on that, each task starts learning only when its pre-tasks are learned well, which can significantly improve the stability and efficiency of the training process. Extensive experiments demonstrate the effectiveness of the proposed method. The overall model can infer more reliable object depth than existing methods and outperforms the state-of-the-art image-based monocular 3D detectors by 3.74% and 4.7% AP40 of the car and pedestrian categories on the KITTI benchmark.

Results

TaskDatasetMetricValueModel
Object DetectionWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)2.14GUP Net
Object DetectionKITTI-360AP2527.25GUPNet
Object DetectionKITTI-360AP500.87GUPNet
3DWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)2.14GUP Net
3DKITTI-360AP2527.25GUPNet
3DKITTI-360AP500.87GUPNet
2D ClassificationWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)2.14GUP Net
2D ClassificationKITTI-360AP2527.25GUPNet
2D ClassificationKITTI-360AP500.87GUPNet
2D Object DetectionWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)2.14GUP Net
2D Object DetectionKITTI-360AP2527.25GUPNet
2D Object DetectionKITTI-360AP500.87GUPNet
16kWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)2.14GUP Net
16kKITTI-360AP2527.25GUPNet
16kKITTI-360AP500.87GUPNet

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16