TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RoadFormer+: Delivering RGB-X Scene Parsing through Scale-...

RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion

Jianxin Huang, Jiahang Li, Ning Jia, Yuxiang Sun, Chengju Liu, Qijun Chen, Rui Fan

2024-07-31Scene ParsingThermal Image SegmentationSemantic Segmentation
PaperPDF

Abstract

Task-specific data-fusion networks have marked considerable achievements in urban scene parsing. Among these networks, our recently proposed RoadFormer successfully extracts heterogeneous features from RGB images and surface normal maps and fuses these features through attention mechanisms, demonstrating compelling efficacy in RGB-Normal road scene parsing. However, its performance significantly deteriorates when handling other types/sources of data or performing more universal, all-category scene parsing tasks. To overcome these limitations, this study introduces RoadFormer+, an efficient, robust, and adaptable model capable of effectively fusing RGB-X data, where ``X'', represents additional types/modalities of data such as depth, thermal, surface normal, and polarization. Specifically, we propose a novel hybrid feature decoupling encoder to extract heterogeneous features and decouple them into global and local components. These decoupled features are then fused through a dual-branch multi-scale heterogeneous feature fusion block, which employs parallel Transformer attentions and convolutional neural network modules to merge multi-scale features across different scales and receptive fields. The fused features are subsequently fed into a decoder to generate the final semantic predictions. Notably, our proposed RoadFormer+ ranks first on the KITTI Road benchmark and achieves state-of-the-art performance in mean intersection over union on the Cityscapes, MFNet, FMB, and ZJU datasets. Moreover, it reduces the number of learnable parameters by 65\% compared to RoadFormer. Our source code will be publicly available at mias.group/RoadFormerPlus.

Results

TaskDatasetMetricValueModel
Semantic SegmentationSYN-UDTIRIIoU94.11RoadFormer+ (B)
Semantic SegmentationZJU-RGB-PmIoU93RoadFormer+ (ConvNeXt-L, RGB-AoLP)
Semantic SegmentationZJU-RGB-PmIoU92.9RoadFormer+ (ConvNeXt-B, RGB-AoLP)
Semantic SegmentationFMB DatasetmIoU73.1RoadFormer+ (RGB-Infrared)
Semantic SegmentationMFN DatasetmIOU62.7RoadFormer+ (ConvNeXt-L)
Scene SegmentationMFN DatasetmIOU62.7RoadFormer+ (ConvNeXt-L)
2D Object DetectionMFN DatasetmIOU62.7RoadFormer+ (ConvNeXt-L)
10-shot image generationSYN-UDTIRIIoU94.11RoadFormer+ (B)
10-shot image generationZJU-RGB-PmIoU93RoadFormer+ (ConvNeXt-L, RGB-AoLP)
10-shot image generationZJU-RGB-PmIoU92.9RoadFormer+ (ConvNeXt-B, RGB-AoLP)
10-shot image generationFMB DatasetmIoU73.1RoadFormer+ (RGB-Infrared)
10-shot image generationMFN DatasetmIOU62.7RoadFormer+ (ConvNeXt-L)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15