TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CSFNet: A Cosine Similarity Fusion Network for Real-Time R...

CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Danial Qashqai, Emad Mousavian, Shahriar Baradaran Shokouhi, Sattar Mirzakuchaki

2024-07-01Autonomous VehiclesThermal Image SegmentationReal-Time Semantic SegmentationScene UnderstandingSegmentationSemantic SegmentationImage Segmentation
PaperPDFCode(official)

Abstract

Semantic segmentation, as a crucial component of complex visual interpretation, plays a fundamental role in autonomous vehicle vision systems. Recent studies have significantly improved the accuracy of semantic segmentation by exploiting complementary information and developing multimodal methods. Despite the gains in accuracy, multimodal semantic segmentation methods suffer from high computational complexity and low inference speed. Therefore, it is a challenging task to implement multimodal methods in driving applications. To address this problem, we propose the Cosine Similarity Fusion Network (CSFNet) as a real-time RGB-X semantic segmentation model. Specifically, we design a Cosine Similarity Attention Fusion Module (CS-AFM) that effectively rectifies and fuses features of two modalities. The CS-AFM module leverages cross-modal similarity to achieve high generalization ability. By enhancing the fusion of cross-modal features at lower levels, CS-AFM paves the way for the use of a single-branch network at higher levels. Therefore, we use dual and single-branch architectures in an encoder, along with an efficient context module and a lightweight decoder for fast and accurate predictions. To verify the effectiveness of CSFNet, we use the Cityscapes, MFNet, and ZJU datasets for the RGB-D/T/P semantic segmentation. According to the results, CSFNet has competitive accuracy with state-of-the-art methods while being state-of-the-art in terms of speed among multimodal semantic segmentation models. It also achieves high efficiency due to its low parameter count and computational complexity. The source code for CSFNet will be available at https://github.com/Danial-Qashqai/CSFNet.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCityscapes valmIoU76.36CSFNet-2
Semantic SegmentationCityscapes valmIoU74.73CSFNet-1
Semantic SegmentationZJU-RGB-PmIoU91.4CSFNet-2
Semantic SegmentationZJU-RGB-PFrame (fps)108.5CSFNet-1
Semantic SegmentationZJU-RGB-PmIoU90.85CSFNet-1
Semantic SegmentationMFN DatasetmIOU59.98CSFNet-2
Semantic SegmentationMFN DatasetmIOU56.05CSFNet-1
Semantic SegmentationCityscapes valmIoU76.36CSFNet-2
Semantic SegmentationCityscapes valFrame (fps)106.1CSFNet-1
Semantic SegmentationCityscapes valmIoU74.73CSFNet-1
Scene SegmentationMFN DatasetmIOU59.98CSFNet-2
Scene SegmentationMFN DatasetmIOU56.05CSFNet-1
2D Object DetectionMFN DatasetmIOU59.98CSFNet-2
2D Object DetectionMFN DatasetmIOU56.05CSFNet-1
10-shot image generationCityscapes valmIoU76.36CSFNet-2
10-shot image generationCityscapes valmIoU74.73CSFNet-1
10-shot image generationZJU-RGB-PmIoU91.4CSFNet-2
10-shot image generationZJU-RGB-PFrame (fps)108.5CSFNet-1
10-shot image generationZJU-RGB-PmIoU90.85CSFNet-1
10-shot image generationMFN DatasetmIOU59.98CSFNet-2
10-shot image generationMFN DatasetmIOU56.05CSFNet-1
10-shot image generationCityscapes valmIoU76.36CSFNet-2
10-shot image generationCityscapes valFrame (fps)106.1CSFNet-1
10-shot image generationCityscapes valmIoU74.73CSFNet-1

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17