TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Instance Segmentation/RefCOCO+ testA

Instance Segmentation on RefCOCO+ testA

Metric: Overall IoU (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Overall IoU▼Extra DataPaperDate↕Code
1HyperSeg83.5YesHyperSeg: Towards Universal Visual Segmentation ...2024-11-26Code
2MLCD-Seg-7B82.9YesMulti-label Cluster Discrimination for Visual Re...2024-07-24Code
3DeRIS-L82.34NoDeRIS: Decoupling Perception and Cognition for E...2025-07-02Code
4EVF-SAM80YesEVF-SAM: Early Vision-Language Fusion for Text-P...2024-06-28Code
5DETRIS78.6NoDensely Connected Parameter-Efficient Tuning for...2025-01-15Code
6UniLSeg-10078.29YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
7C3VG77.96NoMulti-task Visual Grounding with Coarse-to-Fine ...2025-01-12Code
8UniLSeg-2077.02YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
9UNINEXT-H76.42YesUniversal Instance Perception as Object Discover...2023-03-12Code
10MaskRIS (Swin-B, combined DB)75.15NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
11GROUNDHOG75YesGROUNDHOG: Grounding Large Language Models to Ho...2024-02-26-
12PolyFormer-L74.56YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
13SafaRi-B74.53NoSafaRi:Adaptive Sequence Transformer for Weakly ...2024-07-02-
14MaskRIS (Swin-B)74.46NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
15PolyFormer-B72.89YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
16MagNet71.32NoMask Grounding for Referring Image Segmentation2023-12-19Code
17ReLA71.02NoGRES: Generalized Referring Expression Segmentat...2023-06-01Code
18VLT68.43NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
19LAVT68.38NoLAVT: Language-Aware Vision Transformer for Refe...2021-12-04Code
20CRIS68.08NoCRIS: CLIP-Driven Referring Image Segmentation2021-11-30Code
21MaIL65.92NoMaIL: A Unified Mask-Image-Language Trimodal Net...2021-11-21-
22VLT59.2NoVision-Language Transformer and Query Generation...2021-08-12Code
23SHNet58.46NoComprehensive Multi-Modal Interactions for Refer...2021-04-21Code
24CPMC53.44NoReferring Image Segmentation via Cross-Modal Pro...2020-10-01Code
25BRINet52.87No---
26MattNet52.39NoMAttNet: Modular Attention Network for Referring...2018-01-24Code
27STEP (5-fold)52.33No---
28RefVOS with BERT + MLM Loss49.73NoRefVOS: A Closer Look at Referring Expressions f...2020-10-01Code
29CMSA47.6NoCross-Modal Self-Attention Network for Referring...2019-04-09Code