TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Referring Image Matting

Referring Image Matting

Jizhizi Li, Jing Zhang, DaCheng Tao

2022-06-10CVPR 2023 1Referring Image Matting (RefMatte-RW100)Visual GroundingImage MattingReferring Image Matting (Expression-based)Domain GeneralizationReferring Image Matting (Keyword-based)
PaperPDFCode(official)

Abstract

Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting. First, we establish a large-scale challenging dataset RefMatte by designing a comprehensive image composition and expression generation engine to automatically produce high-quality images along with diverse text attributes based on public datasets. RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions. Additionally, we construct a real-world test set with 100 high-resolution natural images and manually annotate complex phrases to evaluate the out-of-domain generalization abilities of RIM methods. Furthermore, we present a novel baseline method CLIPMat for RIM, including a context-embedded prompt, a text-driven semantic pop-up, and a multi-level details extractor. Extensive experiments on RefMatte in both keyword and expression settings validate the superiority of CLIPMat over representative methods. We hope this work could provide novel insights into image matting and encourage more follow-up studies. The dataset, code and models are available at https://github.com/JizhiziLi/RIM.

Results

TaskDatasetMetricValueModel
Referring Image MattingRefMatteMAD0.0238CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMAD(E)0.0254CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMSE0.0212CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMSE(E)0.0226CLIPMat (ViT-L/14)
Referring Image MattingRefMatteSAD42.05CLIPMat (ViT-L/14)
Referring Image MattingRefMatteSAD(E)44.77CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMAD0.0273CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMAD(E)0.0273CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMSE0.0245CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMSE(E)0.026CLIPMat (ViT-B/16)
Referring Image MattingRefMatteSAD47.97CLIPMat (ViT-B/16)
Referring Image MattingRefMatteSAD(E)50.84CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMAD0.0049CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMAD(E)0.0051CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMSE0.0022CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMSE(E)0.0023CLIPMat (ViT-L/14)
Referring Image MattingRefMatteSAD8.51CLIPMat (ViT-L/14)
Referring Image MattingRefMatteSAD(E)8.98CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMAD0.0057CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMAD(E)0.0059CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMSE0.0028CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMSE(E)0.0029CLIPMat (ViT-B/16)
Referring Image MattingRefMatteSAD9.91CLIPMat (ViT-B/16)
Referring Image MattingRefMatteSAD(E)10.41CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMAD0.051CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMAD(E)0.0505CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMSE0.0488CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMSE(E)0.0483CLIPMat (ViT-L/14)
Referring Image MattingRefMatteSAD88.52CLIPMat (ViT-L/14)
Referring Image MattingRefMatteSAD(E)87.92CLIPMat (ViT-L/14)
Referring Image MattingRefMatteMAD0.0636CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMAD(E)0.0635CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMSE0.0614CLIPMat (ViT-B/16)
Referring Image MattingRefMatteMSE(E)0.0612CLIPMat (ViT-B/16)
Referring Image MattingRefMatteSAD110.66CLIPMat (ViT-B/16)
Referring Image MattingRefMatteSAD(E)110.63CLIPMat (ViT-B/16)

Related Papers

Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition2025-07-15From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion2025-07-11VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation2025-07-09A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding2025-07-09