MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, Jian Pu

2023-02-21Autonomous Driving object-detection 3D Object Detection Object Detection

Abstract

Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals' correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51.7% NDS and 45.3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

Results

Task	Dataset	Metric	Value	Model
Object Detection	nuscenes Camera-Radar	NDS	51.7	MVFusion
3D	nuscenes Camera-Radar	NDS	51.7	MVFusion
3D Object Detection	nuscenes Camera-Radar	NDS	51.7	MVFusion
2D Classification	nuscenes Camera-Radar	NDS	51.7	MVFusion
2D Object Detection	nuscenes Camera-Radar	NDS	51.7	MVFusion
16k	nuscenes Camera-Radar	NDS	51.7	MVFusion

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19 AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17 Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17 Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17