Description
VOS is a type of video object segmentation model consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks.
Papers Using This Method
VideoMolmo: Spatio-Temporal Grounding Meets Pointing2025-06-05Foundations of Unknown-aware Machine Learning2025-05-20RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory2025-04-23FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution2025-04-13GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation2025-04-10Zero-Shot 4D Lidar Panoptic Segmentation2025-04-01An Analysis of Data Transformation Effects on Segment Anything 22025-02-25Efficient Track Anything2024-11-28Addressing Issues with Working Memory in Video Object Segmentation2024-10-29X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation2024-09-28LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation2024-09-09Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS2024-08-29CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track2024-08-24LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS2024-08-20UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track2024-08-19Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track2024-08-19Improving Unsupervised Video Object Segmentation via Fake Flow Generation2024-07-16RMem: Restricted Memory Banks Improve Video Object Segmentation2024-06-12Training-Free Robust Interactive Video Object Segmentation2024-06-08A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation2024-06-07