TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SAMURAI: Adapting Segment Anything Model for Zero-Shot Vis...

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

2024-11-18Visual Object TrackingVisual TrackingObject Tracking
PaperPDFCode(official)

Abstract

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects. Furthermore, the fixed-window memory approach in the original model does not consider the quality of memories selected to condition the image features for the next frame, leading to error propagation in videos. This paper introduces SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. By incorporating temporal motion cues with the proposed motion-aware memory selection mechanism, SAMURAI effectively predicts object motion and refines mask selection, achieving robust, accurate tracking without the need for retraining or fine-tuning. SAMURAI operates in real-time and demonstrates strong zero-shot performance across diverse benchmark datasets, showcasing its ability to generalize without fine-tuning. In evaluations, SAMURAI achieves significant improvements in success rate and precision over existing trackers, with a 7.1% AUC gain on LaSOT$_{\text{ext}}$ and a 3.5% AO gain on GOT-10k. Moreover, it achieves competitive results compared to fully supervised methods on LaSOT, underscoring its robustness in complex tracking scenarios and its potential for real-world applications in dynamic environments.

Results

TaskDatasetMetricValueModel
Object TrackingLaSOTAUC74.2SAMURAI-L
Object TrackingLaSOTNormalized Precision82.7SAMURAI-L
Object TrackingLaSOTPrecision80.2SAMURAI-L
Object TrackingNeedForSpeedAUC0.692SAMURAI-L
Object TrackingDiDiTracking quality0.68SAMURAI
Object TrackingGOT-10kAverage Overlap81.7SAMURAI-L
Object TrackingGOT-10kSuccess Rate 0.592.2SAMURAI-L
Object TrackingGOT-10kSuccess Rate 0.7576.9SAMURAI-L
Object TrackingLaSOT-extAUC61SAMURAI-L
Object TrackingLaSOT-extNormalized Precision73.9SAMURAI-L
Object TrackingLaSOT-extPrecision72.2SAMURAI-L
Object TrackingTrackingNetAccuracy85.3SAMURAI-L
Object TrackingOTB-2015AUC0.715SAMURAI-L
Object TrackingLaSOTAUC74.2SAMURAI-L
Object TrackingLaSOTNormalized Precision82.7SAMURAI-L
Object TrackingLaSOTPrecision80.2SAMURAI-L
Visual Object TrackingLaSOTAUC74.2SAMURAI-L
Visual Object TrackingLaSOTNormalized Precision82.7SAMURAI-L
Visual Object TrackingLaSOTPrecision80.2SAMURAI-L
Visual Object TrackingNeedForSpeedAUC0.692SAMURAI-L
Visual Object TrackingDiDiTracking quality0.68SAMURAI
Visual Object TrackingGOT-10kAverage Overlap81.7SAMURAI-L
Visual Object TrackingGOT-10kSuccess Rate 0.592.2SAMURAI-L
Visual Object TrackingGOT-10kSuccess Rate 0.7576.9SAMURAI-L
Visual Object TrackingLaSOT-extAUC61SAMURAI-L
Visual Object TrackingLaSOT-extNormalized Precision73.9SAMURAI-L
Visual Object TrackingLaSOT-extPrecision72.2SAMURAI-L
Visual Object TrackingTrackingNetAccuracy85.3SAMURAI-L
Visual Object TrackingOTB-2015AUC0.715SAMURAI-L
Visual Object TrackingLaSOTAUC74.2SAMURAI-L
Visual Object TrackingLaSOTNormalized Precision82.7SAMURAI-L
Visual Object TrackingLaSOTPrecision80.2SAMURAI-L

Related Papers

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking2025-07-07UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30Visual and Memory Dual Adapter for Multi-Modal Object Tracking2025-06-30