Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

Maoxun Yuan, Yinyan Wang, Xingxing Wei

2022-09-28Object Detection In Aerial Images Crowd Counting cross-modal alignment Translation Multispectral Object Detection Pedestrian Detection 2D Object Detection Salient Object Detection object-detection Object Detection

Paper PDF

Abstract

Integrating multispectral data in object detection, especially visible and infrared images, has received great attention in recent years. Since visible (RGB) and infrared (IR) images can provide complementary information to handle light variations, the paired images are used in many fields, such as multispectral pedestrian detection, RGB-IR crowd counting and RGB-IR salient object detection. Compared with natural RGB-IR images, we find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems, which are manifested in the position, size and angle deviations of the same object. In this paper, we mainly address the challenge of cross-modal weakly misalignment in aerial RGB-IR images. Specifically, we firstly explain and analyze the cause of the weakly misalignment problem. Then, we propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem by calibrating the feature maps from these two modalities. The module predicts the deviation between two modality objects through an alignment process and utilizes Modality-Selection (MS) strategy to improve the performance of alignment. Finally, a two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images. With comprehensive experiments on the public DroneVehicle datasets, we verify that our method reduces the effect of the cross-modal misalignment and achieve robust detection results.

Results

Task	Dataset	Metric	Value	Model
2D Object Detection	DroneVehicle	Val/mAP50	73.1	TSFADet
2D Object Detection	DroneVehicle	test/mAP50	70.4	TSFADet
Multispectral Object Detection	KAIST Multispectral Pedestrian Detection Benchmark	All Miss Rate	30.74	TSFADet

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

Abstract

Results

Related Papers

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

Abstract

Results

Related Papers