TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer

2024-08-01Visual Object TrackingSemi-Supervised Video Object SegmentationSegmentationSemantic SegmentationVideo SegmentationVideo Object SegmentationVideo Semantic SegmentationImage SegmentationRobot Manipulation Generalization
PaperPDFCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCode

Abstract

We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks. We are releasing our main model, dataset, as well as code for model training and our demo.

Results

TaskDatasetMetricValueModel
VideoM$^3$-VOSAverage IOU69.5SAM 2
VideoMOSEJ&F77.9SAM2
VideoDAVIS 2017 (val)J&F90.7SAM2
VideoDAVIS 2017 (val)Params(M)224.4SAM2
Object TrackingDiDiTracking quality0.649SAM2.1
Object TrackingVOT2022EAO0.692SAM2.1
Video Object SegmentationM$^3$-VOSAverage IOU69.5SAM 2
Video Object SegmentationMOSEJ&F77.9SAM2
Video Object SegmentationDAVIS 2017 (val)J&F90.7SAM2
Video Object SegmentationDAVIS 2017 (val)Params(M)224.4SAM2
Semi-Supervised Video Object SegmentationMOSEJ&F77.9SAM2
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F90.7SAM2
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Params(M)224.4SAM2
Visual Object TrackingDiDiTracking quality0.649SAM2.1
Visual Object TrackingVOT2022EAO0.692SAM2.1

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17