Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Jiayi Zhao, Fei Teng, Kai Luo, Guoqiang Zhao, Zhiyong Li, Xu Zheng, Kailun Yang

2025-03-04Thermal Image Segmentation Semantic Segmentation

Abstract

The perception capability of robotic systems relies on the richness of the dataset. Although Segment Anything Model 2 (SAM2), trained on large datasets, demonstrates strong perception potential in perception tasks, its inherent training paradigm prevents it from being suitable for RGB-T tasks. To address these challenges, we propose SHIFNet, a novel SAM2-driven Hybrid Interaction Paradigm that unlocks the potential of SAM2 with linguistic guidance for efficient RGB-Thermal perception. Our framework consists of two key components: (1) Semantic-Aware Cross-modal Fusion (SACF) module that dynamically balances modality contributions through text-guided affinity learning, overcoming SAM2's inherent RGB bias; (2) Heterogeneous Prompting Decoder (HPD) that enhances global semantic information through a semantic enhancement module and then combined with category embeddings to amplify cross-modal semantic consistency. With 32.27M trainable parameters, SHIFNet achieves state-of-the-art segmentation performance on public benchmarks, reaching 89.8% on PST900 and 67.8% on FMB, respectively. The framework facilitates the adaptation of pre-trained large models to RGB-T segmentation tasks, effectively mitigating the high costs associated with data collection while endowing robotic systems with comprehensive perception capabilities. The source code will be made publicly available at https://github.com/iAsakiT3T/SHIFNet.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	FMB Dataset	mIoU	67.8	SHIFNet (RGB-Infrared)
Semantic Segmentation	PST900	mIoU	89.8	SHIFNet
Semantic Segmentation	MFN Dataset	mIOU	59.2	SHIFNet
Scene Segmentation	PST900	mIoU	89.8	SHIFNet
Scene Segmentation	MFN Dataset	mIOU	59.2	SHIFNet
2D Object Detection	PST900	mIoU	89.8	SHIFNet
2D Object Detection	MFN Dataset	mIOU	59.2	SHIFNet
10-shot image generation	FMB Dataset	mIoU	67.8	SHIFNet (RGB-Infrared)
10-shot image generation	PST900	mIoU	89.8	SHIFNet
10-shot image generation	MFN Dataset	mIOU	59.2	SHIFNet

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Abstract

Results

Related Papers

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Abstract

Results

Related Papers