TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MiPa: Mixed Patch Infrared-Visible Modality Agnostic Objec...

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

Heitor R. Medeiros, David Latortue, Eric Granger, Marco Pedersoli

2024-04-29Autonomous DrivingMultispectral Object DetectionPedestrian Detectionobject-detectionObject Detection
PaperPDFCode(official)

Abstract

In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing uneven recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder, while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa) from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesLLVIPAP0.665MiPa
Object DetectionFLIRAP 0.50.813MiPa
Object DetectionLLVIPAP0.665MiPa
3DFLIRAP 0.50.813MiPa
3DLLVIPAP0.665MiPa
2D ClassificationFLIRAP 0.50.813MiPa
2D ClassificationLLVIPAP0.665MiPa
Pedestrian DetectionLLVIPAP0.665MiPa
2D Object DetectionFLIRAP 0.50.813MiPa
2D Object DetectionLLVIPAP0.665MiPa
16kFLIRAP 0.50.813MiPa
16kLLVIPAP0.665MiPa

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17