CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Jiaming Zhang, Huayao Liu, Kailun Yang, Xinxin Hu, Ruiping Liu, Rainer Stiefelhagen

2022-03-09Autonomous Vehicles Thermal Image Segmentation Camouflaged Object Segmentation Scene Understanding Segmentation Semantic Segmentation Multispectral Object Detection Image Manipulation Localization Pedestrian Detection 3D Object Detection Object Detection Image Segmentation

Paper PDF Code(official)

Abstract

Scene understanding based on image segmentation is a crucial component of autonomous vehicles. Pixel-wise semantic segmentation of RGB images can be advanced by exploiting complementary features from the supplementary modality (X-modality). However, covering a wide variety of sensors with a modality-agnostic model remains an unresolved problem due to variations in sensor characteristics among different modalities. Unlike previous modality-specific methods, in this work, we propose a unified fusion framework, CMX, for RGB-X semantic segmentation. To generalize well across different modalities, that often include supplements as well as uncertainties, a unified cross-modal interaction is crucial for modality fusion. Specifically, we design a Cross-Modal Feature Rectification Module (CM-FRM) to calibrate bi-modal features by leveraging the features from one modality to rectify the features of the other modality. With rectified feature pairs, we deploy a Feature Fusion Module (FFM) to perform sufficient exchange of long-range contexts before mixing. To verify CMX, for the first time, we unify five modalities complementary to RGB, i.e., depth, thermal, polarization, event, and LiDAR. Extensive experiments show that CMX generalizes well to diverse multi-modal fusion, achieving state-of-the-art performances on five RGB-Depth benchmarks, as well as RGB-Thermal, RGB-Polarization, and RGB-LiDAR datasets. Besides, to investigate the generalizability to dense-sparse data fusion, we establish an RGB-Event semantic segmentation benchmark based on the EventScape dataset, on which CMX sets the new state-of-the-art. The source code of CMX is publicly available at https://github.com/huaaaliu/RGBX_Semantic_Segmentation.

Results

Task	Dataset	Metric	Value	Model
Autonomous Vehicles	DVTOD	mAP	81.6	CMX
Autonomous Vehicles	LLVIP	AP	0.596	CMX
Autonomous Vehicles	CVC14	AP50	68.9	CMX
Semantic Segmentation	US3D	mIoU	84.63	CMX
Semantic Segmentation	UPLight	mIoU	92.13	CMX (B2 RGB-AoLP)
Semantic Segmentation	UPLight	mIoU	92.07	CMX (B2 RGB-DoLP)
Semantic Segmentation	KITTI-360	mIoU	64.43	CMX (RGB-Depth)
Semantic Segmentation	KITTI-360	mIoU	64.31	CMX (RGB-LiDAR)
Semantic Segmentation	Porto	IoU	72.85	CMX
Semantic Segmentation	Replica	mIoU	17	CMX
Semantic Segmentation	DSEC	mIoU	72.42	CMX
Semantic Segmentation	SYN-UDTIRI	IoU	93.31	CMX
Semantic Segmentation	Synthetic Bathing Perception	mIoU	94.2	CMX-SRA
Semantic Segmentation	Synthetic Bathing Perception	mIoU	88.23	CMX
Semantic Segmentation	LLRGBD-synthetic	mIoU	66.52	CMX (SegFormer-B2)
Semantic Segmentation	Cityscapes val	mIoU	82.6	CMX (B4)
Semantic Segmentation	Cityscapes val	mIoU	81.6	CMX (B2)
Semantic Segmentation	SELMA	mIoU	91.7	CMX
Semantic Segmentation	ZJU-RGB-P	mIoU	92.6	CMX (B4 RGB-AoLP)
Semantic Segmentation	ZJU-RGB-P	mIoU	92.2	CMX (B2 RGB-DoLP)
Semantic Segmentation	DDD17	mIoU	71.88	CMX
Semantic Segmentation	Event-based Segmentation Dataset	mIoU	85.81	CMX
Semantic Segmentation	SpectralWaste	mIoU	58.2	CMX (RGB-HYPER)
Semantic Segmentation	SpectralWaste	mIoU	56.6	CMX ( RGB-HYPER3 )
Semantic Segmentation	Potsdam	mIoU	85.97	CMX
Semantic Segmentation	TLCGIS	IoU	84.14	CMX
Semantic Segmentation	DeLiVER	mIoU	62.67	CMX (RGB-Depth)
Semantic Segmentation	DeLiVER	mIoU	56.52	CMX (RGB-Event)
Semantic Segmentation	DeLiVER	mIoU	56.37	CMX (RGB-LiDAR)
Semantic Segmentation	EventScape	mIoU	64.28	CMX (B4)
Semantic Segmentation	EventScape	mIoU	61.9	CMX (B2)
Semantic Segmentation	GAMUS	mIoU	75.23	CMX
Semantic Segmentation	Vaihingen	mIoU	82.87	CMX
Semantic Segmentation	BJRoad	IoU	62.28	CMX
Semantic Segmentation	Stanford2D3D - RGBD	Pixel Accuracy	82.6	CMX (SegFormer-B4)
Semantic Segmentation	Stanford2D3D - RGBD	mIoU	62.1	CMX (SegFormer-B4)
Semantic Segmentation	Stanford2D3D - RGBD	Pixel Accuracy	82.3	CMX (SegFormer-B2)
Semantic Segmentation	Stanford2D3D - RGBD	mIoU	61.2	CMX (SegFormer-B2)
Semantic Segmentation	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
Semantic Segmentation	KP day-night	mIoU	46.2	CMX
Semantic Segmentation	RGB-T-Glass-Segmentation	MAE	0.029	CMX
Semantic Segmentation	MFN Dataset	mIOU	59.7	CMX (B4)
Semantic Segmentation	MFN Dataset	mIOU	58.2	CMX (B2)
Object Detection	DSEC	mAP	29.1	CMX
Object Detection	InOutDoor	AP	62.3	CMX
Object Detection	EventPed	AP	58	CMX
Object Detection	PKU-DDD17-Car	mAP50	80.4	CMX
Object Detection	STCrowd	AP	61	CMX
Object Detection	PCOD_1200	S-Measure	0.922	CMX
3D	DSEC	mAP	29.1	CMX
3D	InOutDoor	AP	62.3	CMX
3D	EventPed	AP	58	CMX
3D	PKU-DDD17-Car	mAP50	80.4	CMX
3D	STCrowd	AP	61	CMX
3D	PCOD_1200	S-Measure	0.922	CMX
Camouflaged Object Segmentation	PCOD_1200	S-Measure	0.922	CMX
Object Segmentation	PCOD_1200	S-Measure	0.922	CMX
2D Classification	DSEC	mAP	29.1	CMX
2D Classification	InOutDoor	AP	62.3	CMX
2D Classification	EventPed	AP	58	CMX
2D Classification	PKU-DDD17-Car	mAP50	80.4	CMX
2D Classification	STCrowd	AP	61	CMX
2D Classification	PCOD_1200	S-Measure	0.922	CMX
Pedestrian Detection	DVTOD	mAP	81.6	CMX
Pedestrian Detection	LLVIP	AP	0.596	CMX
Pedestrian Detection	CVC14	AP50	68.9	CMX
Scene Segmentation	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
Scene Segmentation	KP day-night	mIoU	46.2	CMX
Scene Segmentation	RGB-T-Glass-Segmentation	MAE	0.029	CMX
Scene Segmentation	MFN Dataset	mIOU	59.7	CMX (B4)
Scene Segmentation	MFN Dataset	mIOU	58.2	CMX (B2)
2D Object Detection	DSEC	mAP	29.1	CMX
2D Object Detection	InOutDoor	AP	62.3	CMX
2D Object Detection	EventPed	AP	58	CMX
2D Object Detection	PKU-DDD17-Car	mAP50	80.4	CMX
2D Object Detection	STCrowd	AP	61	CMX
2D Object Detection	PCOD_1200	S-Measure	0.922	CMX
2D Object Detection	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
2D Object Detection	KP day-night	mIoU	46.2	CMX
2D Object Detection	RGB-T-Glass-Segmentation	MAE	0.029	CMX
2D Object Detection	MFN Dataset	mIOU	59.7	CMX (B4)
2D Object Detection	MFN Dataset	mIOU	58.2	CMX (B2)
Image Manipulation Localization	Columbia	Average Pixel F1(Fixed threshold)	0.884	CMX (RGB+NP++)
Image Manipulation Localization	Columbia	Average Pixel F1(Fixed threshold)	0.872	CMX (RGB+Bayar)
Image Manipulation Localization	Columbia	Average Pixel F1(Fixed threshold)	0.834	CMX (RGB+SRM)
Image Manipulation Localization	COVERAGE	Average Pixel F1(Fixed threshold)	0.63	CMX (RGB+SRM)
Image Manipulation Localization	COVERAGE	Average Pixel F1(Fixed threshold)	0.592	CMX (RGB+Bayar)
Image Manipulation Localization	COVERAGE	Average Pixel F1(Fixed threshold)	0.577	CMX (RGB+NP++)
Image Manipulation Localization	Casia V1+	Average Pixel F1(Fixed threshold)	0.791	CMX (RGB+SRM)
Image Manipulation Localization	Casia V1+	Average Pixel F1(Fixed threshold)	0.774	CMX (RGB+Bayar)
Image Manipulation Localization	Casia V1+	Average Pixel F1(Fixed threshold)	0.761	CMX (RGB+NP++)
Image Manipulation Localization	CocoGlide	Average Pixel F1(Fixed threshold)	0.585	CMX (RGB+SRM)
Image Manipulation Localization	CocoGlide	Average Pixel F1(Fixed threshold)	0.566	CMX (RGB+Bayar)
Image Manipulation Localization	CocoGlide	Average Pixel F1(Fixed threshold)	0.516	CMX (RGB+NP++)
Image Manipulation Localization	DSO-1	Average Pixel F1(Fixed threshold)	0.895	CMX (RGB+NP++)
Image Manipulation Localization	DSO-1	Average Pixel F1(Fixed threshold)	0.792	CMX (RGB+SRM)
Image Manipulation Localization	DSO-1	Average Pixel F1(Fixed threshold)	0.776	CMX (RGB+Bayar)
10-shot image generation	US3D	mIoU	84.63	CMX
10-shot image generation	UPLight	mIoU	92.13	CMX (B2 RGB-AoLP)
10-shot image generation	UPLight	mIoU	92.07	CMX (B2 RGB-DoLP)
10-shot image generation	KITTI-360	mIoU	64.43	CMX (RGB-Depth)
10-shot image generation	KITTI-360	mIoU	64.31	CMX (RGB-LiDAR)
10-shot image generation	Porto	IoU	72.85	CMX
10-shot image generation	Replica	mIoU	17	CMX
10-shot image generation	DSEC	mIoU	72.42	CMX
10-shot image generation	SYN-UDTIRI	IoU	93.31	CMX
10-shot image generation	Synthetic Bathing Perception	mIoU	94.2	CMX-SRA
10-shot image generation	Synthetic Bathing Perception	mIoU	88.23	CMX
10-shot image generation	LLRGBD-synthetic	mIoU	66.52	CMX (SegFormer-B2)
10-shot image generation	Cityscapes val	mIoU	82.6	CMX (B4)
10-shot image generation	Cityscapes val	mIoU	81.6	CMX (B2)
10-shot image generation	SELMA	mIoU	91.7	CMX
10-shot image generation	ZJU-RGB-P	mIoU	92.6	CMX (B4 RGB-AoLP)
10-shot image generation	ZJU-RGB-P	mIoU	92.2	CMX (B2 RGB-DoLP)
10-shot image generation	DDD17	mIoU	71.88	CMX
10-shot image generation	Event-based Segmentation Dataset	mIoU	85.81	CMX
10-shot image generation	SpectralWaste	mIoU	58.2	CMX (RGB-HYPER)
10-shot image generation	SpectralWaste	mIoU	56.6	CMX ( RGB-HYPER3 )
10-shot image generation	Potsdam	mIoU	85.97	CMX
10-shot image generation	TLCGIS	IoU	84.14	CMX
10-shot image generation	DeLiVER	mIoU	62.67	CMX (RGB-Depth)
10-shot image generation	DeLiVER	mIoU	56.52	CMX (RGB-Event)
10-shot image generation	DeLiVER	mIoU	56.37	CMX (RGB-LiDAR)
10-shot image generation	EventScape	mIoU	64.28	CMX (B4)
10-shot image generation	EventScape	mIoU	61.9	CMX (B2)
10-shot image generation	GAMUS	mIoU	75.23	CMX
10-shot image generation	Vaihingen	mIoU	82.87	CMX
10-shot image generation	BJRoad	IoU	62.28	CMX
10-shot image generation	Stanford2D3D - RGBD	Pixel Accuracy	82.6	CMX (SegFormer-B4)
10-shot image generation	Stanford2D3D - RGBD	mIoU	62.1	CMX (SegFormer-B4)
10-shot image generation	Stanford2D3D - RGBD	Pixel Accuracy	82.3	CMX (SegFormer-B2)
10-shot image generation	Stanford2D3D - RGBD	mIoU	61.2	CMX (SegFormer-B2)
10-shot image generation	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
10-shot image generation	KP day-night	mIoU	46.2	CMX
10-shot image generation	RGB-T-Glass-Segmentation	MAE	0.029	CMX
10-shot image generation	MFN Dataset	mIOU	59.7	CMX (B4)
10-shot image generation	MFN Dataset	mIOU	58.2	CMX (B2)
16k	DSEC	mAP	29.1	CMX
16k	InOutDoor	AP	62.3	CMX
16k	EventPed	AP	58	CMX
16k	PKU-DDD17-Car	mAP50	80.4	CMX
16k	STCrowd	AP	61	CMX
16k	PCOD_1200	S-Measure	0.922	CMX

Abstract

Results

Task	Dataset	Metric	Value	Model
Autonomous Vehicles	DVTOD	mAP	81.6	CMX
Autonomous Vehicles	LLVIP	AP	0.596	CMX
Autonomous Vehicles	CVC14	AP50	68.9	CMX
Semantic Segmentation	US3D	mIoU	84.63	CMX
Semantic Segmentation	UPLight	mIoU	92.13	CMX (B2 RGB-AoLP)
Semantic Segmentation	UPLight	mIoU	92.07	CMX (B2 RGB-DoLP)
Semantic Segmentation	KITTI-360	mIoU	64.43	CMX (RGB-Depth)
Semantic Segmentation	KITTI-360	mIoU	64.31	CMX (RGB-LiDAR)
Semantic Segmentation	Porto	IoU	72.85	CMX
Semantic Segmentation	Replica	mIoU	17	CMX
Semantic Segmentation	DSEC	mIoU	72.42	CMX
Semantic Segmentation	SYN-UDTIRI	IoU	93.31	CMX
Semantic Segmentation	Synthetic Bathing Perception	mIoU	94.2	CMX-SRA
Semantic Segmentation	Synthetic Bathing Perception	mIoU	88.23	CMX
Semantic Segmentation	LLRGBD-synthetic	mIoU	66.52	CMX (SegFormer-B2)
Semantic Segmentation	Cityscapes val	mIoU	82.6	CMX (B4)
Semantic Segmentation	Cityscapes val	mIoU	81.6	CMX (B2)
Semantic Segmentation	SELMA	mIoU	91.7	CMX
Semantic Segmentation	ZJU-RGB-P	mIoU	92.6	CMX (B4 RGB-AoLP)
Semantic Segmentation	ZJU-RGB-P	mIoU	92.2	CMX (B2 RGB-DoLP)
Semantic Segmentation	DDD17	mIoU	71.88	CMX
Semantic Segmentation	Event-based Segmentation Dataset	mIoU	85.81	CMX
Semantic Segmentation	SpectralWaste	mIoU	58.2	CMX (RGB-HYPER)
Semantic Segmentation	SpectralWaste	mIoU	56.6	CMX ( RGB-HYPER3 )
Semantic Segmentation	Potsdam	mIoU	85.97	CMX
Semantic Segmentation	TLCGIS	IoU	84.14	CMX
Semantic Segmentation	DeLiVER	mIoU	62.67	CMX (RGB-Depth)
Semantic Segmentation	DeLiVER	mIoU	56.52	CMX (RGB-Event)
Semantic Segmentation	DeLiVER	mIoU	56.37	CMX (RGB-LiDAR)
Semantic Segmentation	EventScape	mIoU	64.28	CMX (B4)
Semantic Segmentation	EventScape	mIoU	61.9	CMX (B2)
Semantic Segmentation	GAMUS	mIoU	75.23	CMX
Semantic Segmentation	Vaihingen	mIoU	82.87	CMX
Semantic Segmentation	BJRoad	IoU	62.28	CMX
Semantic Segmentation	Stanford2D3D - RGBD	Pixel Accuracy	82.6	CMX (SegFormer-B4)
Semantic Segmentation	Stanford2D3D - RGBD	mIoU	62.1	CMX (SegFormer-B4)
Semantic Segmentation	Stanford2D3D - RGBD	Pixel Accuracy	82.3	CMX (SegFormer-B2)
Semantic Segmentation	Stanford2D3D - RGBD	mIoU	61.2	CMX (SegFormer-B2)
Semantic Segmentation	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
Semantic Segmentation	KP day-night	mIoU	46.2	CMX
Semantic Segmentation	RGB-T-Glass-Segmentation	MAE	0.029	CMX
Semantic Segmentation	MFN Dataset	mIOU	59.7	CMX (B4)
Semantic Segmentation	MFN Dataset	mIOU	58.2	CMX (B2)
Object Detection	DSEC	mAP	29.1	CMX
Object Detection	InOutDoor	AP	62.3	CMX
Object Detection	EventPed	AP	58	CMX
Object Detection	PKU-DDD17-Car	mAP50	80.4	CMX
Object Detection	STCrowd	AP	61	CMX
Object Detection	PCOD_1200	S-Measure	0.922	CMX
3D	DSEC	mAP	29.1	CMX
3D	InOutDoor	AP	62.3	CMX
3D	EventPed	AP	58	CMX
3D	PKU-DDD17-Car	mAP50	80.4	CMX
3D	STCrowd	AP	61	CMX
3D	PCOD_1200	S-Measure	0.922	CMX
Camouflaged Object Segmentation	PCOD_1200	S-Measure	0.922	CMX
Object Segmentation	PCOD_1200	S-Measure	0.922	CMX
2D Classification	DSEC	mAP	29.1	CMX
2D Classification	InOutDoor	AP	62.3	CMX
2D Classification	EventPed	AP	58	CMX
2D Classification	PKU-DDD17-Car	mAP50	80.4	CMX
2D Classification	STCrowd	AP	61	CMX
2D Classification	PCOD_1200	S-Measure	0.922	CMX
Pedestrian Detection	DVTOD	mAP	81.6	CMX
Pedestrian Detection	LLVIP	AP	0.596	CMX
Pedestrian Detection	CVC14	AP50	68.9	CMX
Scene Segmentation	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
Scene Segmentation	KP day-night	mIoU	46.2	CMX
Scene Segmentation	RGB-T-Glass-Segmentation	MAE	0.029	CMX
Scene Segmentation	MFN Dataset	mIOU	59.7	CMX (B4)
Scene Segmentation	MFN Dataset	mIOU	58.2	CMX (B2)
2D Object Detection	DSEC	mAP	29.1	CMX
2D Object Detection	InOutDoor	AP	62.3	CMX
2D Object Detection	EventPed	AP	58	CMX
2D Object Detection	PKU-DDD17-Car	mAP50	80.4	CMX
2D Object Detection	STCrowd	AP	61	CMX
2D Object Detection	PCOD_1200	S-Measure	0.922	CMX
2D Object Detection	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
2D Object Detection	KP day-night	mIoU	46.2	CMX
2D Object Detection	RGB-T-Glass-Segmentation	MAE	0.029	CMX
2D Object Detection	MFN Dataset	mIOU	59.7	CMX (B4)
2D Object Detection	MFN Dataset	mIOU	58.2	CMX (B2)
Image Manipulation Localization	Columbia	Average Pixel F1(Fixed threshold)	0.884	CMX (RGB+NP++)
Image Manipulation Localization	Columbia	Average Pixel F1(Fixed threshold)	0.872	CMX (RGB+Bayar)
Image Manipulation Localization	Columbia	Average Pixel F1(Fixed threshold)	0.834	CMX (RGB+SRM)
Image Manipulation Localization	COVERAGE	Average Pixel F1(Fixed threshold)	0.63	CMX (RGB+SRM)
Image Manipulation Localization	COVERAGE	Average Pixel F1(Fixed threshold)	0.592	CMX (RGB+Bayar)
Image Manipulation Localization	COVERAGE	Average Pixel F1(Fixed threshold)	0.577	CMX (RGB+NP++)
Image Manipulation Localization	Casia V1+	Average Pixel F1(Fixed threshold)	0.791	CMX (RGB+SRM)
Image Manipulation Localization	Casia V1+	Average Pixel F1(Fixed threshold)	0.774	CMX (RGB+Bayar)
Image Manipulation Localization	Casia V1+	Average Pixel F1(Fixed threshold)	0.761	CMX (RGB+NP++)
Image Manipulation Localization	CocoGlide	Average Pixel F1(Fixed threshold)	0.585	CMX (RGB+SRM)
Image Manipulation Localization	CocoGlide	Average Pixel F1(Fixed threshold)	0.566	CMX (RGB+Bayar)
Image Manipulation Localization	CocoGlide	Average Pixel F1(Fixed threshold)	0.516	CMX (RGB+NP++)
Image Manipulation Localization	DSO-1	Average Pixel F1(Fixed threshold)	0.895	CMX (RGB+NP++)
Image Manipulation Localization	DSO-1	Average Pixel F1(Fixed threshold)	0.792	CMX (RGB+SRM)
Image Manipulation Localization	DSO-1	Average Pixel F1(Fixed threshold)	0.776	CMX (RGB+Bayar)
10-shot image generation	US3D	mIoU	84.63	CMX
10-shot image generation	UPLight	mIoU	92.13	CMX (B2 RGB-AoLP)
10-shot image generation	UPLight	mIoU	92.07	CMX (B2 RGB-DoLP)
10-shot image generation	KITTI-360	mIoU	64.43	CMX (RGB-Depth)
10-shot image generation	KITTI-360	mIoU	64.31	CMX (RGB-LiDAR)
10-shot image generation	Porto	IoU	72.85	CMX
10-shot image generation	Replica	mIoU	17	CMX
10-shot image generation	DSEC	mIoU	72.42	CMX
10-shot image generation	SYN-UDTIRI	IoU	93.31	CMX
10-shot image generation	Synthetic Bathing Perception	mIoU	94.2	CMX-SRA
10-shot image generation	Synthetic Bathing Perception	mIoU	88.23	CMX
10-shot image generation	LLRGBD-synthetic	mIoU	66.52	CMX (SegFormer-B2)
10-shot image generation	Cityscapes val	mIoU	82.6	CMX (B4)
10-shot image generation	Cityscapes val	mIoU	81.6	CMX (B2)
10-shot image generation	SELMA	mIoU	91.7	CMX
10-shot image generation	ZJU-RGB-P	mIoU	92.6	CMX (B4 RGB-AoLP)
10-shot image generation	ZJU-RGB-P	mIoU	92.2	CMX (B2 RGB-DoLP)
10-shot image generation	DDD17	mIoU	71.88	CMX
10-shot image generation	Event-based Segmentation Dataset	mIoU	85.81	CMX
10-shot image generation	SpectralWaste	mIoU	58.2	CMX (RGB-HYPER)
10-shot image generation	SpectralWaste	mIoU	56.6	CMX ( RGB-HYPER3 )
10-shot image generation	Potsdam	mIoU	85.97	CMX
10-shot image generation	TLCGIS	IoU	84.14	CMX
10-shot image generation	DeLiVER	mIoU	62.67	CMX (RGB-Depth)
10-shot image generation	DeLiVER	mIoU	56.52	CMX (RGB-Event)
10-shot image generation	DeLiVER	mIoU	56.37	CMX (RGB-LiDAR)
10-shot image generation	EventScape	mIoU	64.28	CMX (B4)
10-shot image generation	EventScape	mIoU	61.9	CMX (B2)
10-shot image generation	GAMUS	mIoU	75.23	CMX
10-shot image generation	Vaihingen	mIoU	82.87	CMX
10-shot image generation	BJRoad	IoU	62.28	CMX
10-shot image generation	Stanford2D3D - RGBD	Pixel Accuracy	82.6	CMX (SegFormer-B4)
10-shot image generation	Stanford2D3D - RGBD	mIoU	62.1	CMX (SegFormer-B4)
10-shot image generation	Stanford2D3D - RGBD	Pixel Accuracy	82.3	CMX (SegFormer-B2)
10-shot image generation	Stanford2D3D - RGBD	mIoU	61.2	CMX (SegFormer-B2)
10-shot image generation	Noisy RS RGB-T Dataset	mIoU	56.1	CMX (B4)
10-shot image generation	KP day-night	mIoU	46.2	CMX
10-shot image generation	RGB-T-Glass-Segmentation	MAE	0.029	CMX
10-shot image generation	MFN Dataset	mIOU	59.7	CMX (B4)
10-shot image generation	MFN Dataset	mIOU	58.2	CMX (B2)
16k	DSEC	mAP	29.1	CMX
16k	InOutDoor	AP	62.3	CMX
16k	EventPed	AP	58	CMX
16k	PKU-DDD17-Car	mAP50	80.4	CMX
16k	STCrowd	AP	61	CMX
16k	PCOD_1200	S-Measure	0.922	CMX

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Abstract

Results

Related Papers

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Abstract

Results

Related Papers