TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Visual Object Tracking through Visual Prompting

Improving Visual Object Tracking through Visual Prompting

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin

2024-09-27Visual Object TrackingVisual TrackingObject Tracking
PaperPDFCode(official)

Abstract

Learning a discriminative model to distinguish a target from its surrounding distractors is essential to generic visual object tracking. Dynamic target representation adaptation against distractors is challenging due to the limited discriminative capabilities of prevailing trackers. We present a new visual Prompting mechanism for generic Visual Object Tracking (PiVOT) to address this issue. PiVOT proposes a prompt generation network with the pre-trained foundation model CLIP to automatically generate and refine visual prompts, enabling the transfer of foundation model knowledge for tracking. While CLIP offers broad category-level knowledge, the tracker, trained on instance-specific data, excels at recognizing unique object instances. Thus, PiVOT first compiles a visual prompt highlighting potential target locations. To transfer the knowledge of CLIP to the tracker, PiVOT leverages CLIP to refine the visual prompt based on the similarities between candidate objects and the reference templates across potential targets. Once the visual prompt is refined, it can better highlight potential target locations, thereby reducing irrelevant prompt information. With the proposed prompting mechanism, the tracker can generate improved instance-aware feature maps through the guidance of the visual prompt, thus effectively reducing distractors. The proposed method does not involve CLIP during training, thereby keeping the same training complexity and preserving the generalization capability of the pretrained foundation model. Extensive experiments across multiple benchmarks indicate that PiVOT, using the proposed prompting method can suppress distracting objects and enhance the tracker.

Results

TaskDatasetMetricValueModel
Object TrackingLaSOTAUC73.4PiVOT-L
Object TrackingLaSOTNormalized Precision84.7PiVOT-L
Object TrackingLaSOTPrecision82.1PiVOT-L
Object TrackingNeedForSpeedAUC0.682PiVOT-L
Object TrackingAVisTSuccess Rate62.2PiVOT-L
Object TrackingOTB-2015AUC0.712PiVOT-L
Object TrackingOTB-2015Precision0.946PiVOT-L
Visual Object TrackingLaSOTAUC73.4PiVOT-L
Visual Object TrackingLaSOTNormalized Precision84.7PiVOT-L
Visual Object TrackingLaSOTPrecision82.1PiVOT-L
Visual Object TrackingNeedForSpeedAUC0.682PiVOT-L
Visual Object TrackingAVisTSuccess Rate62.2PiVOT-L
Visual Object TrackingOTB-2015AUC0.712PiVOT-L
Visual Object TrackingOTB-2015Precision0.946PiVOT-L

Related Papers

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking2025-07-07UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30Visual and Memory Dual Adapter for Multi-Modal Object Tracking2025-06-30