TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ODTrack: Online Dense Temporal Token Learning for Visual T...

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, Xianxian Li

2024-01-03Visual Object TrackingSemi-Supervised Video Object SegmentationVisual TrackingVideo Object Tracking
PaperPDFCode(official)

Abstract

Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking. However, most current top-performing trackers persistently lean on sparse temporal relationships between reference and search frames via an offline mode. Consequently, they can only interact independently within each image-pair and establish limited temporal correlations. To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner. ODTrack receives video frames of arbitrary length to capture the spatio-temporal trajectory relationships of an instance, and compresses the discrimination features (localization information) of a target into a token sequence to achieve frame-to-frame association. This new solution brings the following benefits: 1) the purified token sequences can serve as prompts for the inference in the next video frame, whereby past information is leveraged to guide future inference; 2) the complex online update strategies are effectively avoided by the iterative propagation of token sequences, and thus we can achieve more efficient model representation and computation. ODTrack achieves a new \textit{SOTA} performance on seven benchmarks, while running at real-time speed. Code and models are available at \url{https://github.com/GXNU-ZhongLab/ODTrack}.

Results

TaskDatasetMetricValueModel
VideoVOT2020EAO0.605ODTrack-L
VideoVOT2020EAO0.581ODTrack-B
VideoNT-VOT211AUC39.6ODTrack
VideoNT-VOT211Precision55.8ODTrack
Object TrackingTNL2KAUC61.7ODTrack-L
Object TrackingTNL2KAUC60.9ODTrack-B
Object TrackingLaSOTAUC74ODTrack-L
Object TrackingLaSOTAUC73.2ODTrack-B
Object TrackingDiDiTracking quality0.608ODTrack
Object TrackingGOT-10kAverage Overlap78.2ODTrack-L
Object TrackingGOT-10kAverage Overlap77ODTrack-B
Object TrackingLaSOT-extAUC53.9ODTrack-L
Object TrackingLaSOT-extAUC52.4ODTrack-B
Object TrackingTrackingNetAccuracy86.1ODTrack-L
Object TrackingTrackingNetAccuracy85.1ODTrack-B
Object TrackingOTB-2015AUC0.724ODTrack-L
Object TrackingOTB-2015AUC0.723ODTrack-B
Object TrackingNT-VOT211AUC39.6ODTrack
Object TrackingNT-VOT211Precision55.8ODTrack
Video Object SegmentationVOT2020EAO0.605ODTrack-L
Video Object SegmentationVOT2020EAO0.581ODTrack-B
Semi-Supervised Video Object SegmentationVOT2020EAO0.605ODTrack-L
Semi-Supervised Video Object SegmentationVOT2020EAO0.581ODTrack-B
Visual Object TrackingTNL2KAUC61.7ODTrack-L
Visual Object TrackingTNL2KAUC60.9ODTrack-B
Visual Object TrackingLaSOTAUC74ODTrack-L
Visual Object TrackingLaSOTAUC73.2ODTrack-B
Visual Object TrackingDiDiTracking quality0.608ODTrack
Visual Object TrackingGOT-10kAverage Overlap78.2ODTrack-L
Visual Object TrackingGOT-10kAverage Overlap77ODTrack-B
Visual Object TrackingLaSOT-extAUC53.9ODTrack-L
Visual Object TrackingLaSOT-extAUC52.4ODTrack-B
Visual Object TrackingTrackingNetAccuracy86.1ODTrack-L
Visual Object TrackingTrackingNetAccuracy85.1ODTrack-B
Visual Object TrackingOTB-2015AUC0.724ODTrack-L
Visual Object TrackingOTB-2015AUC0.723ODTrack-B

Related Papers

HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning2025-06-27Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking2025-06-25Comparison of Two Methods for Stationary Incident Detection Based on Background Image2025-06-17THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation2025-06-07