TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybr...

HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion

Jiahang Li, Peng Yun, Qijun Chen, Rui Fan

2024-04-04Scene ParsingThermal Image SegmentationSemantic Segmentation
PaperPDFCode(official)

Abstract

Data-fusion networks have shown significant promise for RGB-thermal scene parsing. However, the majority of existing studies have relied on symmetric duplex encoders for heterogeneous feature extraction and fusion, paying inadequate attention to the inherent differences between RGB and thermal modalities. Recent progress in vision foundation models (VFMs) trained through self-supervision on vast amounts of unlabeled data has proven their ability to extract informative, general-purpose features. However, this potential has yet to be fully leveraged in the domain. In this study, we take one step toward this new research area by exploring a feasible strategy to fully exploit VFM features for RGB-thermal scene parsing. Specifically, we delve deeper into the unique characteristics of RGB and thermal modalities, thereby designing a hybrid, asymmetric encoder that incorporates both a VFM and a convolutional neural network. This design allows for more effective extraction of complementary heterogeneous features, which are subsequently fused in a dual-path, progressive manner. Moreover, we introduce an auxiliary task to further enrich the local semantics of the fused features, thereby improving the overall performance of RGB-thermal scene parsing. Our proposed HAPNet, equipped with all these components, demonstrates superior performance compared to all other state-of-the-art RGB-thermal scene parsing networks, achieving top ranks across three widely used public RGB-thermal scene parsing datasets. We believe this new paradigm has opened up new opportunities for future developments in data-fusion scene parsing approaches.

Results

TaskDatasetMetricValueModel
Semantic SegmentationNYU Depth v2Mean Accuracy68.8HAPNet
Semantic SegmentationNYU Depth v2Mean IoU55HAPNet
Semantic SegmentationKP day-nightmIoU57.6HAPNet
Semantic SegmentationPST900mIoU89HAPNet
Semantic SegmentationMFN DatasetmIOU61.5HAPNet
Scene SegmentationKP day-nightmIoU57.6HAPNet
Scene SegmentationPST900mIoU89HAPNet
Scene SegmentationMFN DatasetmIOU61.5HAPNet
2D Object DetectionKP day-nightmIoU57.6HAPNet
2D Object DetectionPST900mIoU89HAPNet
2D Object DetectionMFN DatasetmIOU61.5HAPNet
10-shot image generationNYU Depth v2Mean Accuracy68.8HAPNet
10-shot image generationNYU Depth v2Mean IoU55HAPNet
10-shot image generationKP day-nightmIoU57.6HAPNet
10-shot image generationPST900mIoU89HAPNet
10-shot image generationMFN DatasetmIOU61.5HAPNet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15