TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Vision-Aware Text Features in Referring Image Segmentation...

Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding

Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

2024-04-12Referring Video Object SegmentationReferring Expression SegmentationSegmentationSemantic SegmentationImage Segmentation
PaperPDFCode

Abstract

Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. The complexity of this task increases with the intricacy of the sentences provided. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. However, this under-utilization of text understanding limits the model's capability to fully comprehend the given expressions. In this work, we propose a novel framework that specifically emphasizes object and context comprehension inspired by human cognitive processes through Vision-Aware Text Features. Firstly, we introduce a CLIP Prior module to localize the main object of interest and embed the object heatmap into the query initialization process. Secondly, we propose a combination of two components: Contextual Multimodal Decoder and Meaning Consistency Constraint, to further enhance the coherent and consistent interpretation of language cues with the contextual understanding obtained from the image. Our method achieves significant performance improvements on three benchmark datasets RefCOCO, RefCOCO+ and G-Ref. Project page: \url{https://vatex.hkustvgd.com/}.

Results

TaskDatasetMetricValueModel
VideoRefer-YouTube-VOSF67.5VATEX
VideoRefer-YouTube-VOSJ63.3VATEX
VideoRefer-YouTube-VOSJ&F65.4VATEX
Instance SegmentationRefCOCO testAmIoU79.64VATEX
Instance SegmentationRefCoCo valmIoU78.16VATEX
Instance SegmentationRefCOCO testBmIoU75.64VATEX
Instance SegmentationRefCOCOg-testmIoU70.58VATEX
Instance SegmentationRefCOCO+ valMean IoU70.02VATEX
Instance SegmentationRefCOCO+ test BmIoU62.52VATEX
Instance SegmentationDAVIS 2017 (val)J&F score65.4VATEX
Instance SegmentationRefCOCO+ testAmIoU74.41VATEX
Instance SegmentationRefCOCOg-valIoU0.7554VATEX
Instance SegmentationRefCOCOg-valmIoU69.73VATEX
Video Object SegmentationRefer-YouTube-VOSF67.5VATEX
Video Object SegmentationRefer-YouTube-VOSJ63.3VATEX
Video Object SegmentationRefer-YouTube-VOSJ&F65.4VATEX
Referring Expression SegmentationRefCOCO testAmIoU79.64VATEX
Referring Expression SegmentationRefCoCo valmIoU78.16VATEX
Referring Expression SegmentationRefCOCO testBmIoU75.64VATEX
Referring Expression SegmentationRefCOCOg-testmIoU70.58VATEX
Referring Expression SegmentationRefCOCO+ valMean IoU70.02VATEX
Referring Expression SegmentationRefCOCO+ test BmIoU62.52VATEX
Referring Expression SegmentationDAVIS 2017 (val)J&F score65.4VATEX
Referring Expression SegmentationRefCOCO+ testAmIoU74.41VATEX
Referring Expression SegmentationRefCOCOg-valIoU0.7554VATEX
Referring Expression SegmentationRefCOCOg-valmIoU69.73VATEX

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17