TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Spatio-Temporal Transformer for Visual Tracking

Learning Spatio-Temporal Transformer for Visual Tracking

Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu

2021-03-31ICCV 2021 10Visual Object TrackingVisual TrackingObject TrackingVideo Object Tracking
PaperPDFCode(official)

Abstract

In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component. The encoder models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder learns a query embedding to predict the spatial positions of the target objects. Our method casts object tracking as a direct bounding box prediction problem, without using any proposals or predefined anchors. With the encoder-decoder transformer, the prediction of objects just uses a simple fully-convolutional network, which estimates the corners of objects directly. The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing, thus largely simplifying existing tracking pipelines. The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running at real-time speed, being 6x faster than Siam R-CNN. Code and models are open-sourced at https://github.com/researchmm/Stark.

Results

TaskDatasetMetricValueModel
VideoNT-VOT211AUC38.26STARK
VideoNT-VOT211Precision51.37STARK
Object TrackingLaSOTAUC67.1STARK
Object TrackingLaSOTNormalized Precision77STARK
Object TrackingGOT-10kAverage Overlap68.8STARK
Object TrackingGOT-10kSuccess Rate 0.578.1STARK
Object TrackingAVisTSuccess Rate50.5STARK-ST-101
Object TrackingTrackingNetAccuracy82STARK
Object TrackingTrackingNetNormalized Precision86.9STARK
Object TrackingTrackingNetPrecision79.1STARK
Object TrackingNT-VOT211AUC38.26STARK
Object TrackingNT-VOT211Precision51.37STARK
Visual Object TrackingLaSOTAUC67.1STARK
Visual Object TrackingLaSOTNormalized Precision77STARK
Visual Object TrackingGOT-10kAverage Overlap68.8STARK
Visual Object TrackingGOT-10kSuccess Rate 0.578.1STARK
Visual Object TrackingAVisTSuccess Rate50.5STARK-ST-101
Visual Object TrackingTrackingNetAccuracy82STARK
Visual Object TrackingTrackingNetNormalized Precision86.9STARK
Visual Object TrackingTrackingNetPrecision79.1STARK

Related Papers

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking2025-07-07UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30Visual and Memory Dual Adapter for Multi-Modal Object Tracking2025-06-30