TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Referring Expression Segmentation/RefCOCO+ test B

Referring Expression Segmentation on RefCOCO+ test B

Metric: Overall IoU (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Overall IoU▼Extra DataPaperDate↕Code
1MLCD-Seg-7B75.6YesMulti-label Cluster Discrimination for Visual Re...2024-07-24Code
2HyperSeg75.2YesHyperSeg: Towards Universal Visual Segmentation ...2024-11-26Code
3EVF-SAM71.9YesEVF-SAM: Early Vision-Language Fusion for Text-P...2024-06-28Code
4DETRIS70.2NoDensely Connected Parameter-Efficient Tuning for...2025-01-15Code
5C3VG68.95NoMulti-task Visual Grounding with Coarse-to-Fine ...2025-01-12Code
6UniLSeg-10068.15YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
7UniLSeg-2066.99YesUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
8UNINEXT-H66.22YesUniversal Instance Perception as Object Discover...2023-03-12Code
9GROUNDHOG64.9YesGROUNDHOG: Grounding Large Language Models to Ho...2024-02-26-
10SafaRi-B64.88NoSafaRi:Adaptive Sequence Transformer for Weakly ...2024-07-02-
11MaskRIS (Swin-B, combined DB)62.83NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
12PolyFormer-L61.87YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
13MaskRIS (Swin-B)59.39NoMaskRIS: Semantic Distortion-aware Data Augmenta...2024-11-28Code
14PolyFormer-B59.33YesPolyFormer: Referring Image Segmentation as Sequ...2023-02-14Code
15MagNet58.14NoMask Grounding for Referring Image Segmentation2023-12-19Code
16ReLA57.65NoGRES: Generalized Referring Expression Segmentat...2023-06-01Code
17VLT56.92NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
18MaIL56.06NoMaIL: A Unified Mask-Image-Language Trimodal Net...2021-11-21-
19LAVT55.1NoLAVT: Language-Aware Vision Transformer for Refe...2021-12-04Code
20CRIS53.68NoCRIS: CLIP-Driven Referring Image Segmentation2021-11-30Code
21VLT49.36NoVision-Language Transformer and Query Generation...2021-08-12Code
22SHNet44.12NoComprehensive Multi-Modal Interactions for Refer...2021-04-21Code
23CPMC43.23NoReferring Image Segmentation via Cross-Modal Pro...2020-10-01Code
24BRINet42.13No---
25STEP (5-fold)40.41No---
26MattNet40.08NoMAttNet: Modular Attention Network for Referring...2018-01-24Code
27CMSA37.89NoCross-Modal Self-Attention Network for Referring...2019-04-09Code
28RefVOS with BERT + MLM loss36.17NoRefVOS: A Closer Look at Referring Expressions f...2020-10-01Code