MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu

2022-03-21CVPR 2022 13D Object Detection From Monocular Images Monocular 3D Object Detection Autonomous Driving object-detection 3D Object Detection Object Detection

Paper PDF Code(official)

Abstract

Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features. Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers. Our proposed depth-aware modules can be easily plugged into existing image-only monocular 3D object detectors to improve the performance. Extensive experiments on the KITTI dataset demonstrate that our approach outperforms previous state-of-the-art monocular-based methods and achieves real-time detection. Code is available at https://github.com/kuanchihhuang/MonoDTR

Results

Task	Dataset	Metric	Value	Model
Object Detection	KITTI-360	AP25	39.76	MonoDTR
Object Detection	KITTI-360	AP50	3.02	MonoDTR
3D	KITTI-360	AP25	39.76	MonoDTR
3D	KITTI-360	AP50	3.02	MonoDTR
2D Classification	KITTI-360	AP25	39.76	MonoDTR
2D Classification	KITTI-360	AP50	3.02	MonoDTR
2D Object Detection	KITTI-360	AP25	39.76	MonoDTR
2D Object Detection	KITTI-360	AP50	3.02	MonoDTR
16k	KITTI-360	AP25	39.76	MonoDTR
16k	KITTI-360	AP50	3.02	MonoDTR

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Abstract

Results

Related Papers

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Abstract

Results

Related Papers