TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CAFuser: Condition-Aware Multimodal Fusion for Robust Sema...

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc van Gool

2024-10-14Sensor FusionPanoptic SegmentationAutonomous DrivingSemantic Segmentation
PaperPDFCode(official)

Abstract

Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER. The source code is publicly available at: https://github.com/timbroed/CAFuser.

Results

TaskDatasetMetricValueModel
Semantic SegmentationDELIVERmIoU67.8CAFuser
Semantic SegmentationDELIVERtest mIoU55.6CAFuser
Semantic SegmentationMUSES: MUlti-SEnsor Semantic perception datasetmIoU78.2CAFuser (Swin-T)
Semantic SegmentationDeLiVER testmIoU55.6CAFuser
Semantic SegmentationDeLiVER mIoU68.6CAFuser-CAA
Semantic SegmentationMUSES: MUlti-SEnsor Semantic perception datasetPQ59.7CAFuser (Swin-T)
10-shot image generationDELIVERmIoU67.8CAFuser
10-shot image generationDELIVERtest mIoU55.6CAFuser
10-shot image generationMUSES: MUlti-SEnsor Semantic perception datasetmIoU78.2CAFuser (Swin-T)
10-shot image generationDeLiVER testmIoU55.6CAFuser
10-shot image generationDeLiVER mIoU68.6CAFuser-CAA
10-shot image generationMUSES: MUlti-SEnsor Semantic perception datasetPQ59.7CAFuser (Swin-T)
Panoptic SegmentationMUSES: MUlti-SEnsor Semantic perception datasetPQ59.7CAFuser (Swin-T)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17