Cross-Modality Fusion Transformer for Multispectral Object Detection

Fang Qingyun, Han Dapeng, Wang Zhaokui

2021-10-30Multispectral Object Detection Pedestrian Detection object-detection Object Detection

Abstract

Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust in the open world. To fully exploit the different modalities, we present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper. Unlike prior CNNs-based works, guided by the transformer scheme, our network learns long-range dependencies and integrates global contextual information in the feature extraction stage. More importantly, by leveraging the self attention of the transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral object detection. Extensive experiments and ablation studies on multiple datasets demonstrate that our approach is effective and achieves state-of-the-art detection performance. Our code and models are available at https://github.com/DocF/multispectral-object-detection.

Results

Task	Dataset	Metric	Value	Model
Autonomous Vehicles	DVTOD	mAP	82.7	CFT
Autonomous Vehicles	LLVIP	AP	0.636	CFT
Autonomous Vehicles	CVC14	AP50	78.2	CFT
Pedestrian Detection	DVTOD	mAP	82.7	CFT
Pedestrian Detection	LLVIP	AP	0.636	CFT
Pedestrian Detection	CVC14	AP50	78.2	CFT
Multispectral Object Detection	LLVIP	mAP50	97.5	CFT

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17 Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17 Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08 YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries2025-07-07