Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Instance Segmentation
/
RefCOCO+ testA
Instance Segmentation on RefCOCO+ testA
Metric: Overall IoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Overall IoU
▼
Extra Data
Paper
Date
↕
Code
1
HyperSeg
83.5
Yes
HyperSeg: Towards Universal Visual Segmentation ...
2024-11-26
Code
2
MLCD-Seg-7B
82.9
Yes
Multi-label Cluster Discrimination for Visual Re...
2024-07-24
Code
3
DeRIS-L
82.34
No
DeRIS: Decoupling Perception and Cognition for E...
2025-07-02
Code
4
EVF-SAM
80
Yes
EVF-SAM: Early Vision-Language Fusion for Text-P...
2024-06-28
Code
5
DETRIS
78.6
No
Densely Connected Parameter-Efficient Tuning for...
2025-01-15
Code
6
UniLSeg-100
78.29
Yes
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
7
C3VG
77.96
No
Multi-task Visual Grounding with Coarse-to-Fine ...
2025-01-12
Code
8
UniLSeg-20
77.02
Yes
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
9
UNINEXT-H
76.42
Yes
Universal Instance Perception as Object Discover...
2023-03-12
Code
10
MaskRIS (Swin-B, combined DB)
75.15
No
MaskRIS: Semantic Distortion-aware Data Augmenta...
2024-11-28
Code
11
GROUNDHOG
75
Yes
GROUNDHOG: Grounding Large Language Models to Ho...
2024-02-26
-
12
PolyFormer-L
74.56
Yes
PolyFormer: Referring Image Segmentation as Sequ...
2023-02-14
Code
13
SafaRi-B
74.53
No
SafaRi:Adaptive Sequence Transformer for Weakly ...
2024-07-02
-
14
MaskRIS (Swin-B)
74.46
No
MaskRIS: Semantic Distortion-aware Data Augmenta...
2024-11-28
Code
15
PolyFormer-B
72.89
Yes
PolyFormer: Referring Image Segmentation as Sequ...
2023-02-14
Code
16
MagNet
71.32
No
Mask Grounding for Referring Image Segmentation
2023-12-19
Code
17
ReLA
71.02
No
GRES: Generalized Referring Expression Segmentat...
2023-06-01
Code
18
VLT
68.43
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
19
LAVT
68.38
No
LAVT: Language-Aware Vision Transformer for Refe...
2021-12-04
Code
20
CRIS
68.08
No
CRIS: CLIP-Driven Referring Image Segmentation
2021-11-30
Code
21
MaIL
65.92
No
MaIL: A Unified Mask-Image-Language Trimodal Net...
2021-11-21
-
22
VLT
59.2
No
Vision-Language Transformer and Query Generation...
2021-08-12
Code
23
SHNet
58.46
No
Comprehensive Multi-Modal Interactions for Refer...
2021-04-21
Code
24
CPMC
53.44
No
Referring Image Segmentation via Cross-Modal Pro...
2020-10-01
Code
25
BRINet
52.87
No
-
-
-
26
MattNet
52.39
No
MAttNet: Modular Attention Network for Referring...
2018-01-24
Code
27
STEP (5-fold)
52.33
No
-
-
-
28
RefVOS with BERT + MLM Loss
49.73
No
RefVOS: A Closer Look at Referring Expressions f...
2020-10-01
Code
29
CMSA
47.6
No
Cross-Modal Self-Attention Network for Referring...
2019-04-09
Code