Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Instance Segmentation
/
RefCOCO+ test B
Instance Segmentation on RefCOCO+ test B
Metric: Overall IoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Overall IoU (best first)
Overall IoU (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Overall IoU
▼
Extra Data
Paper
Date
↕
Code
1
MLCD-Seg-7B
75.6
Yes
Multi-label Cluster Discrimination for Visual Re...
2024-07-24
Code
2
HyperSeg
75.2
Yes
HyperSeg: Towards Universal Visual Segmentation ...
2024-11-26
Code
3
EVF-SAM
71.9
Yes
EVF-SAM: Early Vision-Language Fusion for Text-P...
2024-06-28
Code
4
DETRIS
70.2
No
Densely Connected Parameter-Efficient Tuning for...
2025-01-15
Code
5
C3VG
68.95
No
Multi-task Visual Grounding with Coarse-to-Fine ...
2025-01-12
Code
6
UniLSeg-100
68.15
Yes
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
7
UniLSeg-20
66.99
Yes
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
8
UNINEXT-H
66.22
Yes
Universal Instance Perception as Object Discover...
2023-03-12
Code
9
GROUNDHOG
64.9
Yes
GROUNDHOG: Grounding Large Language Models to Ho...
2024-02-26
-
10
SafaRi-B
64.88
No
SafaRi:Adaptive Sequence Transformer for Weakly ...
2024-07-02
-
11
MaskRIS (Swin-B, combined DB)
62.83
No
MaskRIS: Semantic Distortion-aware Data Augmenta...
2024-11-28
Code
12
PolyFormer-L
61.87
Yes
PolyFormer: Referring Image Segmentation as Sequ...
2023-02-14
Code
13
MaskRIS (Swin-B)
59.39
No
MaskRIS: Semantic Distortion-aware Data Augmenta...
2024-11-28
Code
14
PolyFormer-B
59.33
Yes
PolyFormer: Referring Image Segmentation as Sequ...
2023-02-14
Code
15
MagNet
58.14
No
Mask Grounding for Referring Image Segmentation
2023-12-19
Code
16
ReLA
57.65
No
GRES: Generalized Referring Expression Segmentat...
2023-06-01
Code
17
VLT
56.92
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
18
MaIL
56.06
No
MaIL: A Unified Mask-Image-Language Trimodal Net...
2021-11-21
-
19
LAVT
55.1
No
LAVT: Language-Aware Vision Transformer for Refe...
2021-12-04
Code
20
CRIS
53.68
No
CRIS: CLIP-Driven Referring Image Segmentation
2021-11-30
Code
21
VLT
49.36
No
Vision-Language Transformer and Query Generation...
2021-08-12
Code
22
SHNet
44.12
No
Comprehensive Multi-Modal Interactions for Refer...
2021-04-21
Code
23
CPMC
43.23
No
Referring Image Segmentation via Cross-Modal Pro...
2020-10-01
Code
24
BRINet
42.13
No
-
-
-
25
STEP (5-fold)
40.41
No
-
-
-
26
MattNet
40.08
No
MAttNet: Modular Attention Network for Referring...
2018-01-24
Code
27
CMSA
37.89
No
Cross-Modal Self-Attention Network for Referring...
2019-04-09
Code
28
RefVOS with BERT + MLM loss
36.17
No
RefVOS: A Closer Look at Referring Expressions f...
2020-10-01
Code
#1
MLCD-Seg-7B
SOTA
75.6
Overall IoU
· Extra Data
· 2024-07-24
Multi-label Cluster Discrimination for Visual Representation Learning
Code
#2
HyperSeg
75.2
Overall IoU
· Extra Data
· 2024-11-26
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Code
#3
EVF-SAM
SOTA
71.9
Overall IoU
· Extra Data
· 2024-06-28
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Code
#4
DETRIS
70.2
Overall IoU
· 2025-01-15
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Code
#5
C3VG
68.95
Overall IoU
· 2025-01-12
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Code
#6
UniLSeg-100
SOTA
68.15
Overall IoU
· Extra Data
· 2023-12-04
Universal Segmentation at Arbitrary Granularity with Language Instruction
Code
#7
UniLSeg-20
66.99
Overall IoU
· Extra Data
· 2023-12-04
Universal Segmentation at Arbitrary Granularity with Language Instruction
Code
#8
UNINEXT-H
SOTA
66.22
Overall IoU
· Extra Data
· 2023-03-12
Universal Instance Perception as Object Discovery and Retrieval
Code
#9
GROUNDHOG
64.9
Overall IoU
· Extra Data
· 2024-02-26
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
#10
SafaRi-B
64.88
Overall IoU
· 2024-07-02
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
#11
MaskRIS (Swin-B, combined DB)
62.83
Overall IoU
· 2024-11-28
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
Code
#12
PolyFormer-L
SOTA
61.87
Overall IoU
· Extra Data
· 2023-02-14
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Code
#13
MaskRIS (Swin-B)
59.39
Overall IoU
· 2024-11-28
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
Code
#14
PolyFormer-B
59.33
Overall IoU
· Extra Data
· 2023-02-14
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Code
#15
MagNet
58.14
Overall IoU
· 2023-12-19
Mask Grounding for Referring Image Segmentation
Code
#16
ReLA
57.65
Overall IoU
· 2023-06-01
GRES: Generalized Referring Expression Segmentation
Code
#17
VLT
SOTA
56.92
Overall IoU
· 2022-10-28
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
Code
#18
MaIL
SOTA
56.06
Overall IoU
· 2021-11-21
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
#19
LAVT
55.1
Overall IoU
· 2021-12-04
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Code
#20
CRIS
53.68
Overall IoU
· 2021-11-30
CRIS: CLIP-Driven Referring Image Segmentation
Code
#21
VLT
SOTA
49.36
Overall IoU
· 2021-08-12
Vision-Language Transformer and Query Generation for Referring Segmentation
Code
#22
SHNet
SOTA
44.12
Overall IoU
· 2021-04-21
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
Code
#23
CPMC
SOTA
43.23
Overall IoU
· 2020-10-01
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Code
#24
BRINet
42.13
Overall IoU
No paper
#25
STEP (5-fold)
40.41
Overall IoU
No paper
#26
MattNet
SOTA
40.08
Overall IoU
· 2018-01-24
MAttNet: Modular Attention Network for Referring Expression Comprehension
Code
#27
CMSA
37.89
Overall IoU
· 2019-04-09
Cross-Modal Self-Attention Network for Referring Image Segmentation
Code
#28
RefVOS with BERT + MLM loss
36.17
Overall IoU
· 2020-10-01
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Code