TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/M3D-RPN: Monocular 3D Region Proposal Network for Object D...

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

Garrick Brazil, Xiaoming Liu

2019-07-13ICCV 2019 103D Object Detection From Monocular ImagesRegion ProposalMonocular 3D Object DetectionScene UnderstandingAutonomous DrivingVehicle Pose Estimationobject-detection3D Object DetectionObject Detection
PaperPDFCodeCodeCode(official)Code

Abstract

Understanding the world in 3D is a critical component of urban autonomous driving. Generally, the combination of expensive LiDAR sensors and stereo RGB imaging has been paramount for successful 3D object detection algorithms, whereas monocular image-only methods experience drastically reduced performance. We propose to reduce the gap by reformulating the monocular 3D detection problem as a standalone 3D region proposal network. We leverage the geometric relationship of 2D and 3D perspectives, allowing 3D boxes to utilize well-known and powerful convolutional features generated in the image-space. To help address the strenuous 3D parameter estimations, we further design depth-aware convolutional layers which enable location specific feature development and in consequence improved 3D scene understanding. Compared to prior work in monocular 3D detection, our method consists of only the proposed 3D region proposal network rather than relying on external networks, data, or multiple stages. M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

Results

TaskDatasetMetricValueModel
Pose EstimationKITTI Cars HardAverage Orientation Similarity67.08M3D-RPN
Object DetectionRope3DAP@0.716.75M3D-RPN+(G)
Object DetectionKITTI Cars ModerateAP Medium9.71M3D-RPN
Object DetectionWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)0.65M3D-RPN
3DRope3DAP@0.716.75M3D-RPN+(G)
3DKITTI Cars ModerateAP Medium9.71M3D-RPN
3DWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)0.65M3D-RPN
3DKITTI Cars HardAverage Orientation Similarity67.08M3D-RPN
3D Object DetectionRope3DAP@0.716.75M3D-RPN+(G)
3D Object DetectionKITTI Cars ModerateAP Medium9.71M3D-RPN
2D ClassificationRope3DAP@0.716.75M3D-RPN+(G)
2D ClassificationKITTI Cars ModerateAP Medium9.71M3D-RPN
2D ClassificationWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)0.65M3D-RPN
2D Object DetectionRope3DAP@0.716.75M3D-RPN+(G)
2D Object DetectionKITTI Cars ModerateAP Medium9.71M3D-RPN
2D Object DetectionWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)0.65M3D-RPN
1 Image, 2*2 StitchiKITTI Cars HardAverage Orientation Similarity67.08M3D-RPN
16kRope3DAP@0.716.75M3D-RPN+(G)
16kKITTI Cars ModerateAP Medium9.71M3D-RPN
16kWaymo Open Dataset3D mAPH Vehicle (Front Camera Only)0.65M3D-RPN

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17