TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RefVOS: A Closer Look at Referring Expressions for Video O...

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giro-i-Nieto

2020-10-01Referring Expression SegmentationSegmentationVideo Object SegmentationImage Segmentation
PaperPDFCode(official)Code

Abstract

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. Our work argues that existing benchmarks used for this task are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the phrases in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, with the non-trivial REs annotated with seven RE semantic categories. We leverage this data to analyze the results of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for language-guided VOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.

Results

TaskDatasetMetricValueModel
Instance SegmentationRefCoCo valOverall IoU59.45RefVOS with BERT + MLM loss
Instance SegmentationRefCoCo valOverall IoU58.65RefVOS with BERT Pre-train
Instance SegmentationA2D SentencesIoU mean0.599RefVOS
Instance SegmentationA2D SentencesIoU overall0.599RefVOS
Instance SegmentationA2D SentencesPrecision@0.50.495RefVOS
Instance SegmentationA2D SentencesPrecision@0.90.064RefVOS
Instance SegmentationRefCOCO+ valOverall IoU44.71RefVOS with BERT + MLM loss
Instance SegmentationA2Dre testMean IoU33.2RefVos
Instance SegmentationA2Dre testOverall IoU47.5RefVos
Instance SegmentationRefCOCO+ test BOverall IoU36.17RefVOS with BERT + MLM loss
Instance SegmentationDAVIS 2017 (val)J&F 1st frame45.1RefVOS
Instance SegmentationDAVIS 2017 (val)J&F 1st frame44.5RefVOS
Instance SegmentationDAVIS 2017 (val)J&F Full video45.1RefVOS
Instance SegmentationRefCOCO+ testAOverall IoU49.73RefVOS with BERT + MLM Loss
Referring Expression SegmentationRefCoCo valOverall IoU59.45RefVOS with BERT + MLM loss
Referring Expression SegmentationRefCoCo valOverall IoU58.65RefVOS with BERT Pre-train
Referring Expression SegmentationA2D SentencesIoU mean0.599RefVOS
Referring Expression SegmentationA2D SentencesIoU overall0.599RefVOS
Referring Expression SegmentationA2D SentencesPrecision@0.50.495RefVOS
Referring Expression SegmentationA2D SentencesPrecision@0.90.064RefVOS
Referring Expression SegmentationRefCOCO+ valOverall IoU44.71RefVOS with BERT + MLM loss
Referring Expression SegmentationA2Dre testMean IoU33.2RefVos
Referring Expression SegmentationA2Dre testOverall IoU47.5RefVos
Referring Expression SegmentationRefCOCO+ test BOverall IoU36.17RefVOS with BERT + MLM loss
Referring Expression SegmentationDAVIS 2017 (val)J&F 1st frame45.1RefVOS
Referring Expression SegmentationDAVIS 2017 (val)J&F 1st frame44.5RefVOS
Referring Expression SegmentationDAVIS 2017 (val)J&F Full video45.1RefVOS
Referring Expression SegmentationRefCOCO+ testAOverall IoU49.73RefVOS with BERT + MLM Loss

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17