CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Danial Qashqai, Emad Mousavian, Shahriar Baradaran Shokouhi, Sattar Mirzakuchaki

2024-07-01Autonomous Vehicles Thermal Image Segmentation Real-Time Semantic Segmentation Scene Understanding Segmentation Semantic Segmentation Image Segmentation

Paper PDF Code(official)

Abstract

Semantic segmentation, as a crucial component of complex visual interpretation, plays a fundamental role in autonomous vehicle vision systems. Recent studies have significantly improved the accuracy of semantic segmentation by exploiting complementary information and developing multimodal methods. Despite the gains in accuracy, multimodal semantic segmentation methods suffer from high computational complexity and low inference speed. Therefore, it is a challenging task to implement multimodal methods in driving applications. To address this problem, we propose the Cosine Similarity Fusion Network (CSFNet) as a real-time RGB-X semantic segmentation model. Specifically, we design a Cosine Similarity Attention Fusion Module (CS-AFM) that effectively rectifies and fuses features of two modalities. The CS-AFM module leverages cross-modal similarity to achieve high generalization ability. By enhancing the fusion of cross-modal features at lower levels, CS-AFM paves the way for the use of a single-branch network at higher levels. Therefore, we use dual and single-branch architectures in an encoder, along with an efficient context module and a lightweight decoder for fast and accurate predictions. To verify the effectiveness of CSFNet, we use the Cityscapes, MFNet, and ZJU datasets for the RGB-D/T/P semantic segmentation. According to the results, CSFNet has competitive accuracy with state-of-the-art methods while being state-of-the-art in terms of speed among multimodal semantic segmentation models. It also achieves high efficiency due to its low parameter count and computational complexity. The source code for CSFNet will be available at https://github.com/Danial-Qashqai/CSFNet.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Cityscapes val	mIoU	76.36	CSFNet-2
Semantic Segmentation	Cityscapes val	mIoU	74.73	CSFNet-1
Semantic Segmentation	ZJU-RGB-P	mIoU	91.4	CSFNet-2
Semantic Segmentation	ZJU-RGB-P	Frame (fps)	108.5	CSFNet-1
Semantic Segmentation	ZJU-RGB-P	mIoU	90.85	CSFNet-1
Semantic Segmentation	MFN Dataset	mIOU	59.98	CSFNet-2
Semantic Segmentation	MFN Dataset	mIOU	56.05	CSFNet-1
Semantic Segmentation	Cityscapes val	mIoU	76.36	CSFNet-2
Semantic Segmentation	Cityscapes val	Frame (fps)	106.1	CSFNet-1
Semantic Segmentation	Cityscapes val	mIoU	74.73	CSFNet-1
Scene Segmentation	MFN Dataset	mIOU	59.98	CSFNet-2
Scene Segmentation	MFN Dataset	mIOU	56.05	CSFNet-1
2D Object Detection	MFN Dataset	mIOU	59.98	CSFNet-2
2D Object Detection	MFN Dataset	mIOU	56.05	CSFNet-1
10-shot image generation	Cityscapes val	mIoU	76.36	CSFNet-2
10-shot image generation	Cityscapes val	mIoU	74.73	CSFNet-1
10-shot image generation	ZJU-RGB-P	mIoU	91.4	CSFNet-2
10-shot image generation	ZJU-RGB-P	Frame (fps)	108.5	CSFNet-1
10-shot image generation	ZJU-RGB-P	mIoU	90.85	CSFNet-1
10-shot image generation	MFN Dataset	mIOU	59.98	CSFNet-2
10-shot image generation	MFN Dataset	mIOU	56.05	CSFNet-1
10-shot image generation	Cityscapes val	mIoU	76.36	CSFNet-2
10-shot image generation	Cityscapes val	Frame (fps)	106.1	CSFNet-1
10-shot image generation	Cityscapes val	mIoU	74.73	CSFNet-1

CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Abstract

Results

Related Papers

CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Abstract

Results

Related Papers