TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MonoDTR: Monocular 3D Object Detection with Depth-Aware Tr...

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu

2022-03-21CVPR 2022 13D Object Detection From Monocular ImagesMonocular 3D Object DetectionAutonomous Drivingobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features. Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers. Our proposed depth-aware modules can be easily plugged into existing image-only monocular 3D object detectors to improve the performance. Extensive experiments on the KITTI dataset demonstrate that our approach outperforms previous state-of-the-art monocular-based methods and achieves real-time detection. Code is available at https://github.com/kuanchihhuang/MonoDTR

Results

TaskDatasetMetricValueModel
Object DetectionKITTI-360AP2539.76MonoDTR
Object DetectionKITTI-360AP503.02MonoDTR
3DKITTI-360AP2539.76MonoDTR
3DKITTI-360AP503.02MonoDTR
2D ClassificationKITTI-360AP2539.76MonoDTR
2D ClassificationKITTI-360AP503.02MonoDTR
2D Object DetectionKITTI-360AP2539.76MonoDTR
2D Object DetectionKITTI-360AP503.02MonoDTR
16kKITTI-360AP2539.76MonoDTR
16kKITTI-360AP503.02MonoDTR

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17