MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

2024-08-15Rgb-T Tracking

Abstract

Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal information. Inspired by the recently emerged State Space Model Mamba, renowned for its impressive long sequence modeling capabilities and linear computational complexity, this work innovatively proposes a pure Mamba-based framework (MambaVT) to fully exploit spatio-temporal contextual modeling for robust visible-thermal tracking. Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations, and introduce short-term historical trajectory prompts to predict the subsequent target states based on local temporal location clues. Extensive experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks while requiring lower computational costs. We aim for this work to serve as a simple yet strong baseline, stimulating future research in this field. The code and pre-trained models will be made available.

Results

Task	Dataset	Metric	Value	Model
Visual Tracking	LasHeR	Precision	73	MambaVT-S256
Visual Tracking	LasHeR	Success	57.9	MambaVT-S256
Visual Tracking	LasHeR	Precision	72.7	MambaVT-M256
Visual Tracking	LasHeR	Success	57.5	MambaVT-M256
Visual Tracking	GTOT	Precision	95.2	MambaVT-M256
Visual Tracking	GTOT	Success	78.6	MambaVT-M256
Visual Tracking	GTOT	Precision	94.1	MambaVT-S256
Visual Tracking	GTOT	Success	75.3	MambaVT-S256
Visual Tracking	RGBT234	Precision	90.7	MambaVT-M256
Visual Tracking	RGBT234	Success	67.5	MambaVT-M256
Visual Tracking	RGBT234	Precision	88.9	MambaVT-S256
Visual Tracking	RGBT234	Success	65.8	MambaVT-S256
Visual Tracking	RGBT210	Precision	88.5	MambaVT-M256
Visual Tracking	RGBT210	Success	64.4	MambaVT-M256
Visual Tracking	RGBT210	Precision	88	MambaVT-S256
Visual Tracking	RGBT210	Success	63.7	MambaVT-S256

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Abstract

Results

Related Papers

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Abstract

Results

Related Papers