TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/UniRGB-IR: A Unified Framework for RGB-Infrared Semantic T...

UniRGB-IR: A Unified Framework for RGB-Infrared Semantic Tasks via Adapter Tuning

Maoxun Yuan, Bo Cui, Tianyi Zhao, Jiayi Wang, Shan Fu, Xingxing Wei

2024-04-26Thermal Image SegmentationMultispectral Object DetectionPedestrian Detection
PaperPDFCode(official)

Abstract

Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on their RGB-IR semantic relevance datasets, which results in poor scalability and limited generalization. In this work, we propose a general and efficient framework called UniRGB-IR to unify RGB-IR semantic tasks, in which a novel adapter is developed to efficiently introduce richer RGB-IR features into the pre-trained RGB-based foundation model. Specifically, our framework consists of a RGB-based foundation model, a Multi-modal Feature Pool (MFP) module and a Supplementary Feature Injector (SFI) module. The MFP and SFI modules cooperate with each other as an adapter to effectively complement the RGB-based features with the rich RGB-IR features. During training process, we freeze the entire foundation model to inherit prior knowledge and only optimize the proposed adapter. Furthermore, to verify the effectiveness of our framework, we utilize the vanilla vision transformer (ViT-Base) as the pre-trained foundation model to perform extensive experiments. Experimental results on various RGB-IR downstream tasks demonstrate that our method can achieve state-of-the-art performance. The source code and results are available at https://github.com/PoTsui99/UniRGB-IR.git.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesLLVIPAP0.632UniRGB-IR
Semantic SegmentationPST900mIoU82.8UniRGB-IR
Semantic SegmentationMFN DatasetmIOU59.3UniRGB-IR
Pedestrian DetectionLLVIPAP0.632UniRGB-IR
Scene SegmentationPST900mIoU82.8UniRGB-IR
Scene SegmentationMFN DatasetmIOU59.3UniRGB-IR
2D Object DetectionPST900mIoU82.8UniRGB-IR
2D Object DetectionMFN DatasetmIOU59.3UniRGB-IR
Multispectral Object DetectionKAIST Multispectral Pedestrian Detection BenchmarkAll Miss Rate25.21UniRGB-IR
10-shot image generationPST900mIoU82.8UniRGB-IR
10-shot image generationMFN DatasetmIOU59.3UniRGB-IR

Related Papers

YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries2025-07-07YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework2025-06-17Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras2025-05-23Multispectral Detection Transformer with Infrared-Centric Sensor Fusion2025-05-21Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation2025-05-11Attention-Aware Multi-View Pedestrian Tracking2025-04-03Panoramic Distortion-Aware Tokenization for Person Detection and Localization Using Transformers in Overhead Fisheye Images2025-03-18Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume2025-03-14