TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Referring Expression Segmentation/RefCoCo val

Referring Expression Segmentation on RefCoCo val

Metric: Overall IoU (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Overall IoU▼Extra DataPaperDate↕Code
1DeRIS-L85.41NoDeRIS: Decoupling Perception and Cognition for E...2025-07-02Code
2HyperSeg84.8YesHyperSeg: Towards Universal Visual Segmentation ...2024-11-26Code
3PSALM83.6YesPSALM: Pixelwise SegmentAtion with Large Multi-M...2024-03-21Code
4MLCD-Seg-7B83.6YesMulti-label Cluster Discrimination for Visual Re...2024-07-24Code
5HIPIE82.8YesHierarchical Open-vocabulary Universal Image Seg...2023-07-03Code
6EVF-SAM82.4YesEVF-SAM: Early Vision-Language Fusion for Text-P...2024-06-28Code
7UNINEXT-H82.19YesUniversal Instance Perception as Object Discover...2023-03-12Code
8UniLSeg-10081.74YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
9DETRIS81NoDensely Connected Parameter-Efficient Tuning for...2025-01-15Code
10C3VG80.89NoMulti-task Visual Grounding with Coarse-to-Fine ...2025-01-12Code
11GLEE-Pro80YesGeneral Object Foundation Model for Images and V...2023-12-14Code
12SegAgent79.7NoSegAgent: Exploring Pixel Understanding Capabili...2025-03-11Code
13MaskRIS (Swin-B, combined DB)78.71NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
14GROUNDHOG78.5YesGROUNDHOG: Grounding Large Language Models to Ho...2024-02-26-
15SafaRi-B77.21NoSafaRi:Adaptive Sequence Transformer for Weakly ...2024-07-02-
16MaskRIS (Swin-B)76.49NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
17PolyFormer-L75.96YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
18MagNet75.24NoMask Grounding for Referring Image Segmentation2023-12-19Code
19PolyFormer-B74.82YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
20ReLA73.82NoGRES: Generalized Referring Expression Segmentat...2023-06-01Code
21VPD73.25NoUnleashing Text-to-Image Diffusion Models for Vi...2023-03-03Code
22VLT72.96NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
23ETRIS71.06NoBridging Vision and Language Encoders: Parameter...2023-07-21Code
24RefTR70.56NoReferring Transformer: A One-step Approach to Mu...2021-06-06Code
25CRIS70.47NoCRIS: CLIP-Driven Referring Image Segmentation2021-11-30Code
26MaIL70.13NoMaIL: A Unified Mask-Image-Language Trimodal Net...2021-11-21-
27VLT65.65NoVision-Language Transformer and Query Generation...2021-08-12Code
28SHNet65.32NoComprehensive Multi-Modal Interactions for Refer...2021-04-21Code
29CPMC61.36NoReferring Image Segmentation via Cross-Modal Pro...2020-10-01Code
30BRINet61.35No---
31RefVOS with BERT + MLM loss59.45NoRefVOS: A Closer Look at Referring Expressions f...2020-10-01Code
32LANG2SEG58.9NoReferring Expression Object Segmentation with Ca...2019-10-10Code
33RefVOS with BERT Pre-train58.65NoRefVOS: A Closer Look at Referring Expressions f...2020-10-01Code
34CMSA58.32NoCross-Modal Self-Attention Network for Referring...2019-04-09Code
35STEP (1-fold)56.58No---
36MattNet56.51NoMAttNet: Modular Attention Network for Referring...2018-01-24Code