TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MambaVT: Spatio-Temporal Contextual Modeling for robust RG...

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

2024-08-15Rgb-T Tracking
PaperPDFCode

Abstract

Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal information. Inspired by the recently emerged State Space Model Mamba, renowned for its impressive long sequence modeling capabilities and linear computational complexity, this work innovatively proposes a pure Mamba-based framework (MambaVT) to fully exploit spatio-temporal contextual modeling for robust visible-thermal tracking. Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations, and introduce short-term historical trajectory prompts to predict the subsequent target states based on local temporal location clues. Extensive experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks while requiring lower computational costs. We aim for this work to serve as a simple yet strong baseline, stimulating future research in this field. The code and pre-trained models will be made available.

Results

TaskDatasetMetricValueModel
Visual TrackingLasHeRPrecision73MambaVT-S256
Visual TrackingLasHeRSuccess57.9MambaVT-S256
Visual TrackingLasHeRPrecision72.7MambaVT-M256
Visual TrackingLasHeRSuccess57.5MambaVT-M256
Visual TrackingGTOTPrecision95.2MambaVT-M256
Visual TrackingGTOTSuccess78.6MambaVT-M256
Visual TrackingGTOTPrecision94.1MambaVT-S256
Visual TrackingGTOTSuccess75.3MambaVT-S256
Visual TrackingRGBT234Precision90.7MambaVT-M256
Visual TrackingRGBT234Success67.5MambaVT-M256
Visual TrackingRGBT234Precision88.9MambaVT-S256
Visual TrackingRGBT234Success65.8MambaVT-S256
Visual TrackingRGBT210Precision88.5MambaVT-M256
Visual TrackingRGBT210Success64.4MambaVT-M256
Visual TrackingRGBT210Precision88MambaVT-S256
Visual TrackingRGBT210Success63.7MambaVT-S256

Related Papers

Lightweight RGB-T Tracking with Mobile Vision Transformers2025-06-23Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking2025-05-06Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking2025-03-14Adaptive Perception for Unified Visual Multi-modal Object Tracking2025-02-10BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination2025-01-07PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking2025-01-01SUTrack: Towards Simple and Unified Single Object Tracking2024-12-26Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking2024-12-20