TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unified Sequence-to-Sequence Learning for Single- and Mult...

Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking

Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu

2023-04-27CVPR 2023 1Visual Object TrackingVisual TrackingRgb-T TrackingObject Tracking
PaperPDFCode(official)

Abstract

In this paper, we introduce a new sequence-to-sequence learning framework for RGB-based and multi-modal object tracking. First, we present SeqTrack for RGB-based tracking. It casts visual tracking as a sequence generation task, forecasting object bounding boxes in an autoregressive manner. This differs from previous trackers, which depend on the design of intricate head networks, such as classification and regression heads. SeqTrack employs a basic encoder-decoder transformer architecture. The encoder utilizes a bidirectional transformer for feature extraction, while the decoder generates bounding box sequences autoregressively using a causal transformer. The loss function is a plain cross-entropy. Second, we introduce SeqTrackv2, a unified sequence-to-sequence framework for multi-modal tracking tasks. Expanding upon SeqTrack, SeqTrackv2 integrates a unified interface for auxiliary modalities and a set of task-prompt tokens to specify the task. This enables it to manage multi-modal tracking tasks using a unified model and parameter set. This sequence learning paradigm not only simplifies the tracking framework, but also showcases superior performance across 14 challenging benchmarks spanning five single- and multi-modal tracking tasks. The code and models are available at https://github.com/chenxin-dlut/SeqTrackv2.

Results

TaskDatasetMetricValueModel
Visual TrackingLasHeRPrecision76.7SeqTrackv2-L384
Visual TrackingLasHeRSuccess61SeqTrackv2-L384
Visual TrackingLasHeRPrecision74.1SeqTrackv2-L256
Visual TrackingLasHeRSuccess58.8SeqTrackv2-L256
Visual TrackingLasHeRPrecision71.5SeqTrackv2-B384
Visual TrackingLasHeRSuccess56.2SeqTrackv2-B384
Visual TrackingLasHeRPrecision70.4SeqTrackv2-B256
Visual TrackingLasHeRSuccess55.8SeqTrackv2-B256
Visual TrackingRGBT234Precision92.3SeqTrackv2-L256
Visual TrackingRGBT234Success68.5SeqTrackv2-L256
Visual TrackingRGBT234Precision91.3SeqTrackv2-L384
Visual TrackingRGBT234Success68SeqTrackv2-L384
Visual TrackingRGBT234Precision90SeqTrackv2-B384
Visual TrackingRGBT234Success66.3SeqTrackv2-B384
Visual TrackingRGBT234Precision88SeqTrackv2-B256
Visual TrackingRGBT234Success64.7SeqTrackv2-B256
Object TrackingTNL2KAUC57.8SeqTrack-L384
Object TrackingUAV123AUC0.685SeqTrack-L384
Object TrackingLaSOTAUC72.5SeqTrack-L384
Object TrackingLaSOTNormalized Precision81.5SeqTrack-L384
Object TrackingLaSOTPrecision79.3SeqTrack-L384
Object TrackingNeedForSpeedAUC0.662SeqTrack-L384
Object TrackingGOT-10kAverage Overlap74.8SeqTrack-L384
Object TrackingGOT-10kSuccess Rate 0.581.9SeqTrack-L384
Object TrackingGOT-10kSuccess Rate 0.7572.2SeqTrack-L384
Object TrackingLaSOT-extAUC50.7SeqTrack-L384
Object TrackingLaSOT-extNormalized Precision61.6SeqTrack-L384
Object TrackingLaSOT-extPrecision57.5SeqTrack-L384
Object TrackingTrackingNetAccuracy85.5SeqTrack-L384
Object TrackingTrackingNetNormalized Precision89.8SeqTrack-L384
Object TrackingTrackingNetPrecision85.8SeqTrack-L384
Object TrackingOTB-2015AUC0.683SeqTrack-L384
Visual Object TrackingTNL2KAUC57.8SeqTrack-L384
Visual Object TrackingUAV123AUC0.685SeqTrack-L384
Visual Object TrackingLaSOTAUC72.5SeqTrack-L384
Visual Object TrackingLaSOTNormalized Precision81.5SeqTrack-L384
Visual Object TrackingLaSOTPrecision79.3SeqTrack-L384
Visual Object TrackingNeedForSpeedAUC0.662SeqTrack-L384
Visual Object TrackingGOT-10kAverage Overlap74.8SeqTrack-L384
Visual Object TrackingGOT-10kSuccess Rate 0.581.9SeqTrack-L384
Visual Object TrackingGOT-10kSuccess Rate 0.7572.2SeqTrack-L384
Visual Object TrackingLaSOT-extAUC50.7SeqTrack-L384
Visual Object TrackingLaSOT-extNormalized Precision61.6SeqTrack-L384
Visual Object TrackingLaSOT-extPrecision57.5SeqTrack-L384
Visual Object TrackingTrackingNetAccuracy85.5SeqTrack-L384
Visual Object TrackingTrackingNetNormalized Precision89.8SeqTrack-L384
Visual Object TrackingTrackingNetPrecision85.8SeqTrack-L384
Visual Object TrackingOTB-2015AUC0.683SeqTrack-L384

Related Papers

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking2025-07-07UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30Visual and Memory Dual Adapter for Multi-Modal Object Tracking2025-06-30