TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-Modal Fusion for End-to-End RGB-T Tracking

Multi-Modal Fusion for End-to-End RGB-T Tracking

Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan

2019-08-30Rgb-T TrackingImage-to-Image Translation
PaperPDFCode

Abstract

We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.

Results

TaskDatasetMetricValueModel
Visual TrackingLasHeRPrecision44.7mfDiMP
Visual TrackingLasHeRSuccess34.3mfDiMP
Visual TrackingRGBT210Precision78.6mfDiMP
Visual TrackingRGBT210Success55.5mfDiMP

Related Papers

CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation2025-06-26Lightweight RGB-T Tracking with Mobile Vision Transformers2025-06-23Transforming H&E images into IHC: A Variance-Penalized GAN for Precision Oncology2025-06-23Optimal Transport Driven Asymmetric Image-to-Image Translation for Nuclei Segmentation of Histological Images2025-06-08Deep learning image burst stacking to reconstruct high-resolution ground-based solar observations2025-06-05Multi-Platform Methane Plume Detection via Model and Domain Adaptation2025-06-02Segmenting France Across Four Centuries2025-05-30