TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MeViS: A Large-scale Benchmark for Video Segmentation with...

MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Chen Change Loy

2023-08-16ICCV 2023 1Referring Video Object SegmentationSegmentationSemantic SegmentationVideo SegmentationVideo Object SegmentationVideo Semantic Segmentation
PaperPDFCode(official)

Abstract

This paper strives for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. Existing referring video object datasets typically focus on salient objects and use language expressions that contain excessive static attributes that could potentially enable the target object to be identified in a single frame. These datasets downplay the importance of motion in video content for language-guided video object segmentation. To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. We benchmarked 5 existing referring video object segmentation (RVOS) methods and conducted a comprehensive comparison on the MeViS dataset. The results show that current RVOS methods cannot effectively address motion expression-guided video segmentation. We further analyze the challenges and propose a baseline approach for the proposed MeViS dataset. The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms that leverage motion expressions as a primary cue for object segmentation in complex video scenes. The proposed MeViS dataset has been released at https://henghuiding.github.io/MeViS.

Results

TaskDatasetMetricValueModel
VideoReVOSF31.7LMPM (Swin-T)
VideoReVOSJ21.2LMPM (Swin-T)
VideoReVOSJ&F26.4LMPM (Swin-T)
VideoReVOSR3.2LMPM (Swin-T)
VideoMeViSF40.2LMPM
VideoMeViSJ34.2LMPM
VideoMeViSJ&F37.2LMPM
Video Object SegmentationReVOSF31.7LMPM (Swin-T)
Video Object SegmentationReVOSJ21.2LMPM (Swin-T)
Video Object SegmentationReVOSJ&F26.4LMPM (Swin-T)
Video Object SegmentationReVOSR3.2LMPM (Swin-T)
Video Object SegmentationMeViSF40.2LMPM
Video Object SegmentationMeViSJ34.2LMPM
Video Object SegmentationMeViSJ&F37.2LMPM

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17