CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc van Gool

2024-10-14Sensor Fusion Panoptic Segmentation Autonomous Driving Semantic Segmentation

Abstract

Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER. The source code is publicly available at: https://github.com/timbroed/CAFuser.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	DELIVER	mIoU	67.8	CAFuser
Semantic Segmentation	DELIVER	test mIoU	55.6	CAFuser
Semantic Segmentation	MUSES: MUlti-SEnsor Semantic perception dataset	mIoU	78.2	CAFuser (Swin-T)
Semantic Segmentation	DeLiVER test	mIoU	55.6	CAFuser
Semantic Segmentation	DeLiVER	mIoU	68.6	CAFuser-CAA
Semantic Segmentation	MUSES: MUlti-SEnsor Semantic perception dataset	PQ	59.7	CAFuser (Swin-T)
10-shot image generation	DELIVER	mIoU	67.8	CAFuser
10-shot image generation	DELIVER	test mIoU	55.6	CAFuser
10-shot image generation	MUSES: MUlti-SEnsor Semantic perception dataset	mIoU	78.2	CAFuser (Swin-T)
10-shot image generation	DeLiVER test	mIoU	55.6	CAFuser
10-shot image generation	DeLiVER	mIoU	68.6	CAFuser-CAA
10-shot image generation	MUSES: MUlti-SEnsor Semantic perception dataset	PQ	59.7	CAFuser (Swin-T)
Panoptic Segmentation	MUSES: MUlti-SEnsor Semantic perception dataset	PQ	59.7	CAFuser (Swin-T)

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Abstract

Results

Related Papers

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Abstract

Results

Related Papers