TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Referring Expression Segmentation/RefCOCO+ val

Referring Expression Segmentation on RefCOCO+ val

Metric: Overall IoU (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Overall IoU▼Extra DataPaperDate↕Code
1MLCD-Seg-7B79.4YesMulti-label Cluster Discrimination for Visual Re...2024-07-24Code
2DeRIS-L79.01NoDeRIS: Decoupling Perception and Cognition for E...2025-07-02Code
3HyperSeg79YesHyperSeg: Towards Universal Visual Segmentation ...2024-11-26Code
4EVF-SAM76.5YesEVF-SAM: Early Vision-Language Fusion for Text-P...2024-06-28Code
5DETRIS75.2NoDensely Connected Parameter-Efficient Tuning for...2025-01-15Code
6C3VG74.68NoMulti-task Visual Grounding with Coarse-to-Fine ...2025-01-12Code
7HIPIE73.9YesHierarchical Open-vocabulary Universal Image Seg...2023-07-03Code
8UniLSeg-10073.18YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
9UniLSeg-2072.7YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
10SegAgent72.49NoSegAgent: Exploring Pixel Understanding Capabili...2025-03-11Code
11UNINEXT-H72.47YesUniversal Instance Perception as Object Discover...2023-03-12Code
12SafaRi-B70.78NoSafaRi:Adaptive Sequence Transformer for Weakly ...2024-07-02-
13GROUNDHOG70.5YesGROUNDHOG: Grounding Large Language Models to Ho...2024-02-26-
14MaskRIS (Swin-B, combined DB)70.26NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
15GLEE-Pro69.6YesGeneral Object Foundation Model for Images and V...2023-12-14Code
16PolyFormer-L69.33YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
17PolyFormer-B67.64YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
18MaskRIS (Swin-B)67.54NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
19MagNet66.16NoMask Grounding for Referring Image Segmentation2023-12-19Code
20ReLA66.04NoGRES: Generalized Referring Expression Segmentat...2023-06-01Code
21VLT63.53NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
22CRIS62.27NoCRIS: CLIP-Driven Referring Image Segmentation2021-11-30Code
23MaIL62.23NoMaIL: A Unified Mask-Image-Language Trimodal Net...2021-11-21-
24LAVT62.14NoLAVT: Language-Aware Vision Transformer for Refe...2021-12-04Code
25VLT55.5NoVision-Language Transformer and Query Generation...2021-08-12Code
26SHNet52.75NoComprehensive Multi-Modal Interactions for Refer...2021-04-21Code
27CPMC49.56NoReferring Image Segmentation via Cross-Modal Pro...2020-10-01Code
28BRINet48.57No---
29STEP (5-fold)48.18No---
30MattNet46.67NoMAttNet: Modular Attention Network for Referring...2018-01-24Code
31RefVOS with BERT + MLM loss44.71NoRefVOS: A Closer Look at Referring Expressions f...2020-10-01Code
32CMSA43.76NoCross-Modal Self-Attention Network for Referring...2019-04-09Code