Transformer Tracking

Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, Huchuan Lu

2021-03-29CVPR 2021 1Visual Object Tracking Visual Tracking Object Tracking Video Object Tracking

Abstract

Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, TrackingNet, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU. Code and models are available at https://github.com/chenxin-dlut/TransT.

Results

Task	Dataset	Metric	Value	Model
Video	NT-VOT211	AUC	36.79	TransT
Video	NT-VOT211	Precision	51.97	TransT
Visual Tracking	TNL2K	AUC	50.7	TransT
Object Tracking	COESOT	Precision Rate	67.9	TransT
Object Tracking	COESOT	Success Rate	60.5	TransT
Object Tracking	LaSOT	AUC	64.9	TransT
Object Tracking	LaSOT	Normalized Precision	73.8	TransT
Object Tracking	LaSOT	Precision	69	TransT
Object Tracking	DiDi	Tracking quality	0.465	TransT
Object Tracking	AVisT	Success Rate	49.03	TransT
Object Tracking	NT-VOT211	AUC	36.79	TransT
Object Tracking	NT-VOT211	Precision	51.97	TransT
Visual Object Tracking	LaSOT	AUC	64.9	TransT
Visual Object Tracking	LaSOT	Normalized Precision	73.8	TransT
Visual Object Tracking	LaSOT	Precision	69	TransT
Visual Object Tracking	DiDi	Tracking quality	0.465	TransT
Visual Object Tracking	AVisT	Success Rate	49.03	TransT

Transformer Tracking

Abstract

Results

Related Papers

Transformer Tracking

Abstract

Results

Related Papers