Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

Harsh Maheshwari, Yen-Cheng Liu, Zsolt Kira

2023-04-21Semi-Supervised Semantic Segmentation Semi-Supervised RGBD Semantic Segmentation Segmentation Semantic Segmentation Robust Semi-Supervised RGBD Semantic Segmentation

Paper PDF Code(official)

Abstract

Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	SUN-RGBD	Mean IoU (test)	48.17	DFormer-L
Semantic Segmentation	Stanford2D3D - RGBD	mIoU	57.16	Linear Fusion (Segformer B2)
Semantic Segmentation	2D-3D-S	mIoU (0.1% labels)	40.05	M3L (Linear Fusion B2)
Semantic Segmentation	2D-3D-S	mIoU (0.2% labels)	44.62	M3L (Linear Fusion B2)
Semantic Segmentation	2D-3D-S	mIoU (1% labels)	49.28	M3L (Linear Fusion B2)
Semantic Segmentation	Stanford 2D-3D	MM-Robust mIoU (0.1% labels)	41.36	M3L (Linear Fusion - Segformer B2)
Semantic Segmentation	Stanford 2D-3D	mIoU (0.1% labels)	44.1	M3L (Linear Fusion - Segformer B2)
Semantic Segmentation	Stanford 2D-3D	mIoU (0.1% labels)	41.7	Mean Teacher (Linear Fusion - Segformer B2)
10-shot image generation	SUN-RGBD	Mean IoU (test)	48.17	DFormer-L
10-shot image generation	Stanford2D3D - RGBD	mIoU	57.16	Linear Fusion (Segformer B2)
10-shot image generation	2D-3D-S	mIoU (0.1% labels)	40.05	M3L (Linear Fusion B2)
10-shot image generation	2D-3D-S	mIoU (0.2% labels)	44.62	M3L (Linear Fusion B2)
10-shot image generation	2D-3D-S	mIoU (1% labels)	49.28	M3L (Linear Fusion B2)
10-shot image generation	Stanford 2D-3D	MM-Robust mIoU (0.1% labels)	41.36	M3L (Linear Fusion - Segformer B2)
10-shot image generation	Stanford 2D-3D	mIoU (0.1% labels)	44.1	M3L (Linear Fusion - Segformer B2)
10-shot image generation	Stanford 2D-3D	mIoU (0.1% labels)	41.7	Mean Teacher (Linear Fusion - Segformer B2)

Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

Abstract

Results

Related Papers

Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

Abstract

Results

Related Papers