TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Tracking Meets LoRA: Faster Training, Larger Model, Strong...

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

2024-03-08Visual Object TrackingVisual Trackingparameter-efficient fine-tuning
PaperPDFCode(official)

Abstract

Motivated by the Parameter-Efficient Fine-Tuning (PEFT) in large language models, we propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources. The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency, to the domain of visual tracking. However, unique challenges and potential domain gaps make this transfer not as easy as the first intuition. Firstly, a transformer-based tracker constructs unshared position embedding for template and search image. This poses a challenge for the transfer of LoRA, usually requiring consistency in the design when applied to the pre-trained backbone, to downstream tasks. Secondly, the inductive bias inherent in convolutional heads diminishes the effectiveness of parameter-efficient fine-tuning in tracking models. To overcome these limitations, we first decouple the position embeddings in transformer-based trackers into shared spatial ones and independent type ones. The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones. In contrast, the independent embeddings indicate the sources of each token and are learned from scratch. Furthermore, we design an anchor-free head solely based on MLP to adapt PETR, enabling better performance with less computational overhead. With our design, 1) it becomes practical to train trackers with the ViT-g backbone on GPUs with only memory of 25.8GB (batch size of 16); 2) we reduce the training time of the L-224 variant from 35.0 to 10.8 GPU hours; 3) we improve the LaSOT SUC score from 0.703 to 0.742 with the L-224 variant; 4) we fast the inference speed of the L-224 variant from 52 to 119 FPS. Code and models are available at https://github.com/LitingLin/LoRAT.

Results

TaskDatasetMetricValueModel
Object TrackingTNL2KAUC62.7LoRAT-g-378
Object TrackingTNL2Kprecision67.8LoRAT-g-378
Object TrackingTNL2KAUC62.3LoRAT-L-378
Object TrackingTNL2Kprecision67LoRAT-L-378
Object TrackingUAV123AUC0.739LoRAT-g-378
Object TrackingUAV123AUC0.725LoRAT-L-378
Object TrackingLaSOTAUC76.2LoRAT-g-378
Object TrackingLaSOTNormalized Precision85.3LoRAT-g-378
Object TrackingLaSOTPrecision83.5LoRAT-g-378
Object TrackingLaSOTAUC75.1LoRAT-L-378
Object TrackingLaSOTNormalized Precision84.1LoRAT-L-378
Object TrackingLaSOTPrecision82LoRAT-L-378
Object TrackingNeedForSpeedAUC0.681LoRAT-g-378
Object TrackingNeedForSpeedAUC0.667LoRAT-L-378
Object TrackingGOT-10kAverage Overlap78.9LoRAT-g-378
Object TrackingGOT-10kSuccess Rate 0.587.8LoRAT-g-378
Object TrackingGOT-10kSuccess Rate 0.7580.7LoRAT-g-378
Object TrackingGOT-10kAverage Overlap77.5LoRAT-L-378
Object TrackingGOT-10kSuccess Rate 0.586.2LoRAT-L-378
Object TrackingGOT-10kSuccess Rate 0.7578.1LoRAT-L-378
Object TrackingLaSOT-extAUC56.6LoRAT-L-378
Object TrackingLaSOT-extNormalized Precision69LoRAT-L-378
Object TrackingLaSOT-extPrecision65.1LoRAT-L-378
Object TrackingLaSOT-extAUC56.5LoRAT-g-378
Object TrackingLaSOT-extNormalized Precision69LoRAT-g-378
Object TrackingLaSOT-extPrecision64.9LoRAT-g-378
Object TrackingTrackingNetAccuracy86LoRAT-g-378
Object TrackingTrackingNetNormalized Precision90.2LoRAT-g-378
Object TrackingTrackingNetPrecision86.1LoRAT-g-378
Object TrackingTrackingNetAccuracy85.6LoRAT-L-378
Object TrackingTrackingNetNormalized Precision89.7LoRAT-L-378
Object TrackingTrackingNetPrecision85.4LoRAT-L-378
Visual Object TrackingTNL2KAUC62.7LoRAT-g-378
Visual Object TrackingTNL2Kprecision67.8LoRAT-g-378
Visual Object TrackingTNL2KAUC62.3LoRAT-L-378
Visual Object TrackingTNL2Kprecision67LoRAT-L-378
Visual Object TrackingUAV123AUC0.739LoRAT-g-378
Visual Object TrackingUAV123AUC0.725LoRAT-L-378
Visual Object TrackingLaSOTAUC76.2LoRAT-g-378
Visual Object TrackingLaSOTNormalized Precision85.3LoRAT-g-378
Visual Object TrackingLaSOTPrecision83.5LoRAT-g-378
Visual Object TrackingLaSOTAUC75.1LoRAT-L-378
Visual Object TrackingLaSOTNormalized Precision84.1LoRAT-L-378
Visual Object TrackingLaSOTPrecision82LoRAT-L-378
Visual Object TrackingNeedForSpeedAUC0.681LoRAT-g-378
Visual Object TrackingNeedForSpeedAUC0.667LoRAT-L-378
Visual Object TrackingGOT-10kAverage Overlap78.9LoRAT-g-378
Visual Object TrackingGOT-10kSuccess Rate 0.587.8LoRAT-g-378
Visual Object TrackingGOT-10kSuccess Rate 0.7580.7LoRAT-g-378
Visual Object TrackingGOT-10kAverage Overlap77.5LoRAT-L-378
Visual Object TrackingGOT-10kSuccess Rate 0.586.2LoRAT-L-378
Visual Object TrackingGOT-10kSuccess Rate 0.7578.1LoRAT-L-378
Visual Object TrackingLaSOT-extAUC56.6LoRAT-L-378
Visual Object TrackingLaSOT-extNormalized Precision69LoRAT-L-378
Visual Object TrackingLaSOT-extPrecision65.1LoRAT-L-378
Visual Object TrackingLaSOT-extAUC56.5LoRAT-g-378
Visual Object TrackingLaSOT-extNormalized Precision69LoRAT-g-378
Visual Object TrackingLaSOT-extPrecision64.9LoRAT-g-378
Visual Object TrackingTrackingNetAccuracy86LoRAT-g-378
Visual Object TrackingTrackingNetNormalized Precision90.2LoRAT-g-378
Visual Object TrackingTrackingNetPrecision86.1LoRAT-g-378
Visual Object TrackingTrackingNetAccuracy85.6LoRAT-L-378
Visual Object TrackingTrackingNetNormalized Precision89.7LoRAT-L-378
Visual Object TrackingTrackingNetPrecision85.4LoRAT-L-378

Related Papers

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning2025-06-27Exploring Adapter Design Tradeoffs for Low Resource Music Generation2025-06-26WordCon: Word-level Typography Control in Scene Text Rendering2025-06-26