LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Xiaoliang Tan, Jiaqi Wang, Chanjuan He, Wenlin Zhou

2024-04-21Semantic Segmentation Land Cover Classification

Abstract

Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap, we propose a novel \textbf{L}ightweight \textbf{M}ultimodal data \textbf{F}usion \textbf{Net}work (LMFNet) to accomplish the tasks of fusion and semantic segmentation of multimodal remote sensing images. LMFNet uniquely accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer that minimizes parameter count while ensuring robust feature extraction. Our proposed multimodal fusion module integrates a \textit{Multimodal Feature Fusion Reconstruction Layer} and \textit{Multimodal Feature Self-Attention Fusion Layer}, which can reconstruct and fuse multimodal features. Extensive testing on public datasets such as US3D, ISPRS Potsdam, and ISPRS Vaihingen demonstrates the effectiveness of LMFNet. Specifically, it achieves a mean Intersection over Union ($mIoU$) of 85.09\% on the US3D dataset, marking a significant improvement over existing methods. Compared to unimodal approaches, LMFNet shows a 10\% enhancement in $mIoU$ with only a 0.5M increase in parameter count. Furthermore, against bimodal methods, our approach with trilateral inputs enhances $mIoU$ by 0.46 percentage points.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	US3D	mIoU	85.09	LMFNet-3
Semantic Segmentation	US3D	mIoU	84.5	LMFNet-2
Semantic Segmentation	Potsdam	mIoU	86.39	LMFNet-3
Semantic Segmentation	Potsdam	mIoU	85.51	LMFNet-2
Semantic Segmentation	Vaihingen	mIoU	82.49	LMFNet-2 (
10-shot image generation	US3D	mIoU	85.09	LMFNet-3
10-shot image generation	US3D	mIoU	84.5	LMFNet-2
10-shot image generation	Potsdam	mIoU	86.39	LMFNet-3
10-shot image generation	Potsdam	mIoU	85.51	LMFNet-2
10-shot image generation	Vaihingen	mIoU	82.49	LMFNet-2 (

LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

Abstract

Results

Related Papers

LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

Abstract

Results

Related Papers