TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Image Segmentation Using Text and Image Prompts

Image Segmentation Using Text and Image Prompts

Timo Lüddecke, Alexander S. Ecker

2021-12-18CVPR 2022 1Referring Image Matting (RefMatte-RW100)One-Shot SegmentationReferring ExpressionZero Shot SegmentationReferring Image Matting (Expression-based)Referring Expression SegmentationSegmentationSemantic SegmentationReferring Image Matting (Keyword-based)Image Segmentation
PaperPDFCodeCodeCodeCode(official)CodeCode

Abstract

Image segmentation is usually addressed by training a model for a fixed set of object classes. Incorporating additional classes or more complex queries later is expensive as it requires re-training the model on a dataset that encompasses these expressions. Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text or an image. This approach enables us to create a unified model (trained once) for three common segmentation tasks, which come with distinct challenges: referring expression segmentation, zero-shot segmentation and one-shot segmentation. We build upon the CLIP model as a backbone which we extend with a transformer-based decoder that enables dense prediction. After training on an extended version of the PhraseCut dataset, our system generates a binary segmentation map for an image based on a free-text prompt or on an additional image expressing the query. We analyze different variants of the latter image-based prompts in detail. This novel hybrid input allows for dynamic adaptation not only to the three segmentation tasks mentioned above, but to any binary segmentation task where a text or image query can be formulated. Finally, we find our system to adapt well to generalized queries involving affordances or properties. Code is available at https://eckerlab.org/code/clipseg.

Results

TaskDatasetMetricValueModel
Referring Image MattingRefMatteMAD0.0394CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMAD(E)0.0419CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMSE0.0358CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMSE(E)0.0381CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteSAD69.13CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteSAD(E)73.53CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMAD0.0101CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMAD(E)0.0106CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMSE0.0064CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMSE(E)0.0067CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteSAD17.75CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteSAD(E)18.69CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMAD0.1222CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMAD(E)0.1282CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMSE0.1178CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteMSE(E)0.1236CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteSAD211.86CLIPSeg (ViT-B/16)
Referring Image MattingRefMatteSAD(E)222.37CLIPSeg (ViT-B/16)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17