M$^3$-VOS

M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

ImagesTextsVideosapache-2.0Introduced 2025-06-15

💡 Description

A new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M3^3-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. We collected 205,181 masks, with an average track duration of 14.27s. M3^3-VOS covers 120+ categories of objects across 6 phases within 14 scenarios, encompassing 23 specific phase transitions.