RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo

2024-08-16Rgb-T Tracking All

Abstract

Existing RGBT tracking methods often design various interaction models to perform cross-modal fusion of each layer, but can not execute the feature interactions among all layers, which plays a critical role in robust multimodal representation, due to large computational burden. To address this issue, this paper presents a novel All-layer multimodal Interaction Network, named AINet, which performs efficient and effective feature interactions of all modalities and layers in a progressive fusion Mamba, for robust RGBT tracking. Even though modality features in different layers are known to contain different cues, it is always challenging to build multimodal interactions in each layer due to struggling in balancing interaction capabilities and efficiency. Meanwhile, considering that the feature discrepancy between RGB and thermal modalities reflects their complementary information to some extent, we design a Difference-based Fusion Mamba (DFM) to achieve enhanced fusion of different modalities with linear complexity. When interacting with features from all layers, a huge number of token sequences (3840 tokens in this work) are involved and the computational burden is thus large. To handle this problem, we design an Order-dynamic Fusion Mamba (OFM) to execute efficient and effective feature interactions of all layers by dynamically adjusting the scan order of different layers in Mamba. Extensive experiments on four public RGBT tracking datasets show that AINet achieves leading performance against existing state-of-the-art methods.

Results

Task	Dataset	Metric	Value	Model
Visual Tracking	LasHeR	Precision	74.2	AINet-B384
Visual Tracking	LasHeR	Success	59.1	AINet-B384
Visual Tracking	RGBT234	Precision	89.2	AINet-B384
Visual Tracking	RGBT234	Success	67.3	AINet-B384
Visual Tracking	RGBT210	Precision	87.5	AINet-B384
Visual Tracking	RGBT210	Success	64.8	AINet-B384

Abstract

Task

Dataset

Metric

Value

Model

Visual Tracking

LasHeR

Precision

74.2

AINet-B384

Visual Tracking

LasHeR

Success

59.1

AINet-B384

Visual Tracking

RGBT234

Precision

89.2

AINet-B384

Visual Tracking

RGBT234

Success

67.3

AINet-B384

Visual Tracking

RGBT210

Precision

87.5

AINet-B384

Visual Tracking

RGBT210

Success

64.8

AINet-B384

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

Abstract

Results

Related Papers

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

Abstract

Results

Related Papers