TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Exploring Enhanced Contextual Information for Video-Level ...

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang

2024-12-15AAAI2025 2024 12Visual Object TrackingSemi-Supervised Video Object SegmentationVisual TrackingObject TrackingVideo Object Tracking
PaperPDFCode(official)

Abstract

Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the model's ability to capture and utilize contextual information at multiple levels through deep integration with the backbone. Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on LaSOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance. Code and models are available at https://github.com/kangben258/MCITrack.

Results

TaskDatasetMetricValueModel
VideoVOT2020EAO0.624MCITrack-L384
VideoVOT2020EAO0.619MCITrack-B224
Object TrackingTNL2KAUC65.3MCITrack-L384
Object TrackingTNL2KAUC62.9MCITrack-B224
Object TrackingLaSOTAUC76.6MCITrack-L384
Object TrackingLaSOTNormalized Precision86.1MCITrack-L384
Object TrackingLaSOTPrecision85MCITrack-L384
Object TrackingLaSOTAUC75.3MCITrack-B224
Object TrackingLaSOTNormalized Precision85.6MCITrack-B224
Object TrackingLaSOTPrecision83.3MCITrack-B224
Object TrackingGOT-10kAverage Overlap80MCITrack-L384
Object TrackingGOT-10kSuccess Rate 0.588.5MCITrack-L384
Object TrackingGOT-10kSuccess Rate 0.7580.2MCITrack-L384
Object TrackingGOT-10kAverage Overlap77.9MCITrack-B224
Object TrackingGOT-10kSuccess Rate 0.588.2MCITrack-B224
Object TrackingGOT-10kSuccess Rate 0.7576.8MCITrack-B224
Object TrackingLaSOT-extAUC55.7MCITrack-L384
Object TrackingLaSOT-extNormalized Precision66.5MCITrack-L384
Object TrackingLaSOT-extPrecision62.9MCITrack-L384
Object TrackingLaSOT-extAUC54.6MCITrack-B224
Object TrackingLaSOT-extNormalized Precision65.7MCITrack-B224
Object TrackingLaSOT-extPrecision62.1MCITrack-B224
Object TrackingTrackingNetAccuracy87.9MCITrack-L384
Object TrackingTrackingNetNormalized Precision92.1MCITrack-L384
Object TrackingTrackingNetPrecision89.2MCITrack-L384
Object TrackingTrackingNetAccuracy86.3MCITrack-B224
Object TrackingTrackingNetNormalized Precision90.9MCITrack-B224
Object TrackingTrackingNetPrecision86.1MCITrack-B224
Video Object SegmentationVOT2020EAO0.624MCITrack-L384
Video Object SegmentationVOT2020EAO0.619MCITrack-B224
Semi-Supervised Video Object SegmentationVOT2020EAO0.624MCITrack-L384
Semi-Supervised Video Object SegmentationVOT2020EAO0.619MCITrack-B224
Visual Object TrackingTNL2KAUC65.3MCITrack-L384
Visual Object TrackingTNL2KAUC62.9MCITrack-B224
Visual Object TrackingLaSOTAUC76.6MCITrack-L384
Visual Object TrackingLaSOTNormalized Precision86.1MCITrack-L384
Visual Object TrackingLaSOTPrecision85MCITrack-L384
Visual Object TrackingLaSOTAUC75.3MCITrack-B224
Visual Object TrackingLaSOTNormalized Precision85.6MCITrack-B224
Visual Object TrackingLaSOTPrecision83.3MCITrack-B224
Visual Object TrackingGOT-10kAverage Overlap80MCITrack-L384
Visual Object TrackingGOT-10kSuccess Rate 0.588.5MCITrack-L384
Visual Object TrackingGOT-10kSuccess Rate 0.7580.2MCITrack-L384
Visual Object TrackingGOT-10kAverage Overlap77.9MCITrack-B224
Visual Object TrackingGOT-10kSuccess Rate 0.588.2MCITrack-B224
Visual Object TrackingGOT-10kSuccess Rate 0.7576.8MCITrack-B224
Visual Object TrackingLaSOT-extAUC55.7MCITrack-L384
Visual Object TrackingLaSOT-extNormalized Precision66.5MCITrack-L384
Visual Object TrackingLaSOT-extPrecision62.9MCITrack-L384
Visual Object TrackingLaSOT-extAUC54.6MCITrack-B224
Visual Object TrackingLaSOT-extNormalized Precision65.7MCITrack-B224
Visual Object TrackingLaSOT-extPrecision62.1MCITrack-B224
Visual Object TrackingTrackingNetAccuracy87.9MCITrack-L384
Visual Object TrackingTrackingNetNormalized Precision92.1MCITrack-L384
Visual Object TrackingTrackingNetPrecision89.2MCITrack-L384
Visual Object TrackingTrackingNetAccuracy86.3MCITrack-B224
Visual Object TrackingTrackingNetNormalized Precision90.9MCITrack-B224
Visual Object TrackingTrackingNetPrecision86.1MCITrack-B224

Related Papers

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking2025-07-07UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30Visual and Memory Dual Adapter for Multi-Modal Object Tracking2025-06-30