Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim

2024-07-27object-detection 3D Object Detection Object Detection

Abstract

Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi-modal 3D object detection methods heavily rely on the LiDAR sensor, treating the camera as an auxiliary modality for augmenting semantic details. This often leads to not only underutilization of camera data but also significant performance degradation in scenarios where LiDAR data is unavailable. Additionally, existing fusion methods overlook the detrimental impact of sensor noise induced by environmental changes, on detection performance. In this paper, we propose MEFormer to address the LiDAR over-reliance problem by harnessing critical information for 3D object detection from every available modality while concurrently safeguarding against corrupted signals during the fusion process. Specifically, we introduce Modality Agnostic Decoding (MOAD) that extracts geometric and semantic features with a shared transformer decoder regardless of input modalities and provides promising improvement with a single modality as well as multi-modality. Additionally, our Proximity-based Modality Ensemble (PME) module adaptively utilizes the strengths of each modality depending on the environment while mitigating the effects of a noisy sensor. Our MEFormer achieves state-of-the-art performance of 73.9% NDS and 71.5% mAP in the nuScenes validation set. Extensive analyses validate that our MEFormer improves robustness against challenging conditions such as sensor malfunctions or environmental changes. The source code is available at https://github.com/hanchaa/MEFormer

Results

Task	Dataset	Metric	Value	Model
Object Detection	nuScenes	NDS	0.74	MEFormer
Object Detection	nuScenes	mAAE	0.11	MEFormer
Object Detection	nuScenes	mAOE	0.3	MEFormer
Object Detection	nuScenes	mAP	0.72	MEFormer
Object Detection	nuScenes	mASE	0.24	MEFormer
Object Detection	nuScenes	mATE	0.27	MEFormer
Object Detection	nuScenes	mAVE	0.27	MEFormer
3D	nuScenes	NDS	0.74	MEFormer
3D	nuScenes	mAAE	0.11	MEFormer
3D	nuScenes	mAOE	0.3	MEFormer
3D	nuScenes	mAP	0.72	MEFormer
3D	nuScenes	mASE	0.24	MEFormer
3D	nuScenes	mATE	0.27	MEFormer
3D	nuScenes	mAVE	0.27	MEFormer
3D Object Detection	nuScenes	NDS	0.74	MEFormer
3D Object Detection	nuScenes	mAAE	0.11	MEFormer
3D Object Detection	nuScenes	mAOE	0.3	MEFormer
3D Object Detection	nuScenes	mAP	0.72	MEFormer
3D Object Detection	nuScenes	mASE	0.24	MEFormer
3D Object Detection	nuScenes	mATE	0.27	MEFormer
3D Object Detection	nuScenes	mAVE	0.27	MEFormer
2D Classification	nuScenes	NDS	0.74	MEFormer
2D Classification	nuScenes	mAAE	0.11	MEFormer
2D Classification	nuScenes	mAOE	0.3	MEFormer
2D Classification	nuScenes	mAP	0.72	MEFormer
2D Classification	nuScenes	mASE	0.24	MEFormer
2D Classification	nuScenes	mATE	0.27	MEFormer
2D Classification	nuScenes	mAVE	0.27	MEFormer
2D Object Detection	nuScenes	NDS	0.74	MEFormer
2D Object Detection	nuScenes	mAAE	0.11	MEFormer
2D Object Detection	nuScenes	mAOE	0.3	MEFormer
2D Object Detection	nuScenes	mAP	0.72	MEFormer
2D Object Detection	nuScenes	mASE	0.24	MEFormer
2D Object Detection	nuScenes	mATE	0.27	MEFormer
2D Object Detection	nuScenes	mAVE	0.27	MEFormer
16k	nuScenes	NDS	0.74	MEFormer
16k	nuScenes	mAAE	0.11	MEFormer
16k	nuScenes	mAOE	0.3	MEFormer
16k	nuScenes	mAP	0.72	MEFormer
16k	nuScenes	mASE	0.24	MEFormer
16k	nuScenes	mATE	0.27	MEFormer
16k	nuScenes	mAVE	0.27	MEFormer

Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

Abstract

Results

Related Papers

Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

Abstract

Results

Related Papers