TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cross-modulated Attention Transformer for RGBT Tracking

Cross-modulated Attention Transformer for RGBT Tracking

Yun Xiao, jiacong Zhao, Andong Lu, Chenglong Li, Yin Lin, Bing Yin, Cong Liu

2024-08-05Rgb-T Tracking
PaperPDF

Abstract

Existing Transformer-based RGBT trackers achieve remarkable performance benefits by leveraging self-attention to extract uni-modal features and cross-attention to enhance multi-modal feature interaction and template-search correlation computation. Nevertheless, the independent search-template correlation calculations ignore the consistency between branches, which can result in ambiguous and inappropriate correlation weights. It not only limits the intra-modal feature representation, but also harms the robustness of cross-attention for multi-modal feature interaction and search-template correlation computation. To address these issues, we propose a novel approach called Cross-modulated Attention Transformer (CAFormer), which performs intra-modality self-correlation, inter-modality feature interaction, and search-template correlation computation in a unified attention model, for RGBT tracking. In particular, we first independently generate correlation maps for each modality and feed them into the designed Correlation Modulated Enhancement module, modulating inaccurate correlation weights by seeking the consensus between modalities. Such kind of design unifies self-attention and cross-attention schemes, which not only alleviates inaccurate attention weight computation in self-attention but also eliminates redundant computation introduced by extra cross-attention scheme. In addition, we propose a collaborative token elimination strategy to further improve tracking inference efficiency and accuracy. Extensive experiments on five public RGBT tracking benchmarks show the outstanding performance of the proposed CAFormer against state-of-the-art methods.

Results

TaskDatasetMetricValueModel
Visual TrackingLasHeRPrecision70CAFormer
Visual TrackingLasHeRSuccess55.6CAFormer
Visual TrackingGTOTPrecision91.8CAFormer
Visual TrackingGTOTSuccess76.9CAFormer
Visual TrackingRGBT234Precision88.3CAFormer
Visual TrackingRGBT234Success66.4CAFormer
Visual TrackingRGBT210Precision85.6CAFormer
Visual TrackingRGBT210Success63.2CAFormer

Related Papers

Lightweight RGB-T Tracking with Mobile Vision Transformers2025-06-23Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking2025-05-06Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking2025-03-14Adaptive Perception for Unified Visual Multi-modal Object Tracking2025-02-10BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination2025-01-07PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking2025-01-01SUTrack: Towards Simple and Unified Single Object Tracking2024-12-26Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking2024-12-20