TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MaskRIS: Semantic Distortion-aware Data Augmentation for R...

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Minhyun Lee, Seungho Lee, Song Park, Dongyoon Han, Byeongho Heo, Hyunjung Shim

2024-11-28Data AugmentationReferring Expression SegmentationImage Segmentation
PaperPDFCode(official)

Abstract

Referring Image Segmentation (RIS) is an advanced vision-language task that involves identifying and segmenting objects within an image as described by free-form text descriptions. While previous studies focused on aligning visual and language features, exploring training techniques, such as data augmentation, remains underexplored. In this work, we explore effective data augmentation for RIS and propose a novel training framework called Masked Referring Image Segmentation (MaskRIS). We observe that the conventional image augmentations fall short of RIS, leading to performance degradation, while simple random masking significantly enhances the performance of RIS. MaskRIS uses both image and text masking, followed by Distortion-aware Contextual Learning (DCL) to fully exploit the benefits of the masking strategy. This approach can improve the model's robustness to occlusions, incomplete information, and various linguistic complexities, resulting in a significant performance improvement. Experiments demonstrate that MaskRIS can easily be applied to various RIS models, outperforming existing methods in both fully supervised and weakly supervised settings. Finally, MaskRIS achieves new state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets. Code is available at https://github.com/naver-ai/maskris.

Results

TaskDatasetMetricValueModel
Instance SegmentationRefCOCO testAOverall IoU80.64MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCO testAMean IoU80.24MaskRIS (Swin-B)
Instance SegmentationRefCOCO testAOverall IoU78.96MaskRIS (Swin-B)
Instance SegmentationRefCoCo valOverall IoU78.71MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCoCo valMean IoU78.35MaskRIS (Swin-B)
Instance SegmentationRefCoCo valOverall IoU76.49MaskRIS (Swin-B)
Instance SegmentationRefCOCO testBOverall IoU75.1MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCO testBMean IoU76.06MaskRIS (Swin-B)
Instance SegmentationRefCOCO testBOverall IoU73.96MaskRIS (Swin-B)
Instance SegmentationRefCOCOg-testOverall IoU71.09MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCOg-testMean IoU69.42MaskRIS (Swin-B)
Instance SegmentationRefCOCOg-testOverall IoU66.5MaskRIS (Swin-B)
Instance SegmentationRefCOCO+ valOverall IoU70.26MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCO+ valMean IoU71.68MaskRIS (Swin-B)
Instance SegmentationRefCOCO+ valOverall IoU67.54MaskRIS (Swin-B)
Instance SegmentationRefCOCO+ test BOverall IoU62.83MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCO+ test BMean IoU64.5MaskRIS (Swin-B)
Instance SegmentationRefCOCO+ test BOverall IoU59.39MaskRIS (Swin-B)
Instance SegmentationRefCOCO+ testAOverall IoU75.15MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCO+ testAMean IoU76.73MaskRIS (Swin-B)
Instance SegmentationRefCOCO+ testAOverall IoU74.46MaskRIS (Swin-B)
Instance SegmentationRefCOCOg-valOverall IoU69.12MaskRIS (Swin-B, combined DB)
Instance SegmentationRefCOCOg-valMean IoU69.31MaskRIS (Swin-B)
Instance SegmentationRefCOCOg-valOverall IoU65.55MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO testAOverall IoU80.64MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCO testAMean IoU80.24MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO testAOverall IoU78.96MaskRIS (Swin-B)
Referring Expression SegmentationRefCoCo valOverall IoU78.71MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCoCo valMean IoU78.35MaskRIS (Swin-B)
Referring Expression SegmentationRefCoCo valOverall IoU76.49MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO testBOverall IoU75.1MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCO testBMean IoU76.06MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO testBOverall IoU73.96MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCOg-testOverall IoU71.09MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCOg-testMean IoU69.42MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCOg-testOverall IoU66.5MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO+ valOverall IoU70.26MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCO+ valMean IoU71.68MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO+ valOverall IoU67.54MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO+ test BOverall IoU62.83MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCO+ test BMean IoU64.5MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO+ test BOverall IoU59.39MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO+ testAOverall IoU75.15MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCO+ testAMean IoU76.73MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCO+ testAOverall IoU74.46MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCOg-valOverall IoU69.12MaskRIS (Swin-B, combined DB)
Referring Expression SegmentationRefCOCOg-valMean IoU69.31MaskRIS (Swin-B)
Referring Expression SegmentationRefCOCOg-valOverall IoU65.55MaskRIS (Swin-B)

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation2025-07-15