MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

Heitor R. Medeiros, David Latortue, Eric Granger, Marco Pedersoli

2024-04-29Autonomous Driving Multispectral Object Detection Pedestrian Detection object-detection Object Detection

Abstract

In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing uneven recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder, while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa) from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

Results

Task	Dataset	Metric	Value	Model
Autonomous Vehicles	LLVIP	AP	0.665	MiPa
Object Detection	FLIR	AP 0.5	0.813	MiPa
Object Detection	LLVIP	AP	0.665	MiPa
3D	FLIR	AP 0.5	0.813	MiPa
3D	LLVIP	AP	0.665	MiPa
2D Classification	FLIR	AP 0.5	0.813	MiPa
2D Classification	LLVIP	AP	0.665	MiPa
Pedestrian Detection	LLVIP	AP	0.665	MiPa
2D Object Detection	FLIR	AP 0.5	0.813	MiPa
2D Object Detection	LLVIP	AP	0.665	MiPa
16k	FLIR	AP 0.5	0.813	MiPa
16k	LLVIP	AP	0.665	MiPa

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

Abstract

Results

Related Papers

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

Abstract

Results

Related Papers