Yuhong Li, Xiaofan Zhang, Deming Chen
Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver the state-of-the-art tracking accuracy. However, these solutions are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To deliver both high accuracy and reliable real-time performance, we propose a novel tracker called SiamVGG\footnote{https://github.com/leeyeehoo/SiamVGG}. It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking. The architecture of SiamVGG is customized from VGG-16 with the parameters shared by both exemplary images and desired input video frames. We demonstrate the proposed SiamVGG on OTB-2013/50/100 and VOT 2015/2016/2017 datasets with the state-of-the-art accuracy while maintaining a decent real-time performance of 50 FPS running on a GTX 1080Ti. Our design can achieve 2% higher Expected Average Overlap (EAO) compared to the ECO and C-COT in VOT2017 Challenge.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Tracking | VOT2017 | Expected Average Overlap (EAO) | 0.286 | SiamVGG |
| Object Tracking | VOT2016 | Expected Average Overlap (EAO) | 0.351 | SiamVGG |
| Object Tracking | OTB-50 | AUC | 0.61 | SiamVGG |
| Object Tracking | OTB-2013 | AUC | 0.665 | SiamVGG |
| Object Tracking | OTB-2015 | AUC | 0.654 | SiamVGG |
| Visual Object Tracking | VOT2017 | Expected Average Overlap (EAO) | 0.286 | SiamVGG |
| Visual Object Tracking | VOT2016 | Expected Average Overlap (EAO) | 0.351 | SiamVGG |
| Visual Object Tracking | OTB-50 | AUC | 0.61 | SiamVGG |
| Visual Object Tracking | OTB-2013 | AUC | 0.665 | SiamVGG |
| Visual Object Tracking | OTB-2015 | AUC | 0.654 | SiamVGG |