Learning to Fuse Asymmetric Feature Maps in Siamese Trackers

Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, Jianbing Shen

2020-12-04CVPR 2021 1Visual Object Tracking Visual Tracking Video Object Tracking

Abstract

Recently, Siamese-based trackers have achieved promising performance in visual tracking. Most recent Siamese-based trackers typically employ a depth-wise cross-correlation (DW-XCorr) to obtain multi-channel correlation information from the two feature maps (target and search region). However, DW-XCorr has several limitations within Siamese-based tracking: it can easily be fooled by distractors, has fewer activated channels, and provides weak discrimination of object boundaries. Further, DW-XCorr is a handcrafted parameter-free module and cannot fully benefit from offline learning on large-scale data. We propose a learnable module, called the asymmetric convolution (ACM), which learns to better capture the semantic correlation information in offline training on large-scale data. Different from DW-XCorr and its predecessor(XCorr), which regard a single feature map as the convolution kernel, our ACM decomposes the convolution operation on a concatenated feature map into two mathematically equivalent operations, thereby avoiding the need for the feature maps to be of the same size (width and height)during concatenation. Our ACM can incorporate useful prior information, such as bounding-box size, with standard visual features. Furthermore, ACM can easily be integrated into existing Siamese trackers based on DW-XCorror XCorr. To demonstrate its generalization ability, we integrate ACM into three representative trackers: SiamFC, SiamRPN++, and SiamBAN. Our experiments reveal the benefits of the proposed ACM, which outperforms existing methods on six tracking benchmarks. On the LaSOT test set, our ACM-based tracker obtains a significant improvement of 5.8% in terms of success (AUC), over the baseline.

Results

Task	Dataset	Metric	Value	Model
Video	NT-VOT211	AUC	35.8	SiamBAN-ACM
Video	NT-VOT211	Precision	48.31	SiamBAN-ACM
Object Tracking	LaSOT	AUC	57.2	SiamBAN-ACM
Object Tracking	LaSOT	Normalized Precision	65.3	SiamBAN-ACM
Object Tracking	LaSOT	Precision	58.7	SiamBAN-ACM
Object Tracking	TrackingNet	Accuracy	75.3	SiamBAN-ACM
Object Tracking	TrackingNet	Normalized Precision	81	SiamBAN-ACM
Object Tracking	TrackingNet	Precision	71.2	SiamBAN-ACM
Object Tracking	NT-VOT211	AUC	35.8	SiamBAN-ACM
Object Tracking	NT-VOT211	Precision	48.31	SiamBAN-ACM
Visual Object Tracking	LaSOT	AUC	57.2	SiamBAN-ACM
Visual Object Tracking	LaSOT	Normalized Precision	65.3	SiamBAN-ACM
Visual Object Tracking	LaSOT	Precision	58.7	SiamBAN-ACM
Visual Object Tracking	TrackingNet	Accuracy	75.3	SiamBAN-ACM
Visual Object Tracking	TrackingNet	Normalized Precision	81	SiamBAN-ACM
Visual Object Tracking	TrackingNet	Precision	71.2	SiamBAN-ACM

Learning to Fuse Asymmetric Feature Maps in Siamese Trackers

Abstract

Results

Related Papers

Learning to Fuse Asymmetric Feature Maps in Siamese Trackers

Abstract

Results

Related Papers