Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Referring Expression Segmentation
/
RefCoCo val
Referring Expression Segmentation on RefCoCo val
Metric: Overall IoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Overall IoU (best first)
Overall IoU (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Overall IoU
▼
Extra Data
Paper
Date
↕
Code
1
DeRIS-L
85.41
No
DeRIS: Decoupling Perception and Cognition for E...
2025-07-02
Code
2
HyperSeg
84.8
Yes
HyperSeg: Towards Universal Visual Segmentation ...
2024-11-26
Code
3
PSALM
83.6
Yes
PSALM: Pixelwise SegmentAtion with Large Multi-M...
2024-03-21
Code
4
MLCD-Seg-7B
83.6
Yes
Multi-label Cluster Discrimination for Visual Re...
2024-07-24
Code
5
HIPIE
82.8
Yes
Hierarchical Open-vocabulary Universal Image Seg...
2023-07-03
Code
6
EVF-SAM
82.4
Yes
EVF-SAM: Early Vision-Language Fusion for Text-P...
2024-06-28
Code
7
UNINEXT-H
82.19
Yes
Universal Instance Perception as Object Discover...
2023-03-12
Code
8
UniLSeg-100
81.74
Yes
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
9
DETRIS
81
No
Densely Connected Parameter-Efficient Tuning for...
2025-01-15
Code
10
C3VG
80.89
No
Multi-task Visual Grounding with Coarse-to-Fine ...
2025-01-12
Code
11
GLEE-Pro
80
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
12
SegAgent
79.7
No
SegAgent: Exploring Pixel Understanding Capabili...
2025-03-11
Code
13
MaskRIS (Swin-B, combined DB)
78.71
No
MaskRIS: Semantic Distortion-aware Data Augmenta...
2024-11-28
Code
14
GROUNDHOG
78.5
Yes
GROUNDHOG: Grounding Large Language Models to Ho...
2024-02-26
-
15
SafaRi-B
77.21
No
SafaRi:Adaptive Sequence Transformer for Weakly ...
2024-07-02
-
16
MaskRIS (Swin-B)
76.49
No
MaskRIS: Semantic Distortion-aware Data Augmenta...
2024-11-28
Code
17
PolyFormer-L
75.96
Yes
PolyFormer: Referring Image Segmentation as Sequ...
2023-02-14
Code
18
MagNet
75.24
No
Mask Grounding for Referring Image Segmentation
2023-12-19
Code
19
PolyFormer-B
74.82
Yes
PolyFormer: Referring Image Segmentation as Sequ...
2023-02-14
Code
20
ReLA
73.82
No
GRES: Generalized Referring Expression Segmentat...
2023-06-01
Code
21
VPD
73.25
No
Unleashing Text-to-Image Diffusion Models for Vi...
2023-03-03
Code
22
VLT
72.96
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
23
ETRIS
71.06
No
Bridging Vision and Language Encoders: Parameter...
2023-07-21
Code
24
RefTR
70.56
No
Referring Transformer: A One-step Approach to Mu...
2021-06-06
Code
25
CRIS
70.47
No
CRIS: CLIP-Driven Referring Image Segmentation
2021-11-30
Code
26
MaIL
70.13
No
MaIL: A Unified Mask-Image-Language Trimodal Net...
2021-11-21
-
27
VLT
65.65
No
Vision-Language Transformer and Query Generation...
2021-08-12
Code
28
SHNet
65.32
No
Comprehensive Multi-Modal Interactions for Refer...
2021-04-21
Code
29
CPMC
61.36
No
Referring Image Segmentation via Cross-Modal Pro...
2020-10-01
Code
30
BRINet
61.35
No
-
-
-
31
RefVOS with BERT + MLM loss
59.45
No
RefVOS: A Closer Look at Referring Expressions f...
2020-10-01
Code
32
LANG2SEG
58.9
No
Referring Expression Object Segmentation with Ca...
2019-10-10
Code
33
RefVOS with BERT Pre-train
58.65
No
RefVOS: A Closer Look at Referring Expressions f...
2020-10-01
Code
34
CMSA
58.32
No
Cross-Modal Self-Attention Network for Referring...
2019-04-09
Code
35
STEP (1-fold)
56.58
No
-
-
-
36
MattNet
56.51
No
MAttNet: Modular Attention Network for Referring...
2018-01-24
Code
#1
DeRIS-L
SOTA
85.41
Overall IoU
· 2025-07-02
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Code
#2
HyperSeg
SOTA
84.8
Overall IoU
· Extra Data
· 2024-11-26
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Code
#3
PSALM
SOTA
83.6
Overall IoU
· Extra Data
· 2024-03-21
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
Code
#4
MLCD-Seg-7B
83.6
Overall IoU
· Extra Data
· 2024-07-24
Multi-label Cluster Discrimination for Visual Representation Learning
Code
#5
HIPIE
SOTA
82.8
Overall IoU
· Extra Data
· 2023-07-03
Hierarchical Open-vocabulary Universal Image Segmentation
Code
#6
EVF-SAM
82.4
Overall IoU
· Extra Data
· 2024-06-28
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Code
#7
UNINEXT-H
SOTA
82.19
Overall IoU
· Extra Data
· 2023-03-12
Universal Instance Perception as Object Discovery and Retrieval
Code
#8
UniLSeg-100
81.74
Overall IoU
· Extra Data
· 2023-12-04
Universal Segmentation at Arbitrary Granularity with Language Instruction
Code
#9
DETRIS
81
Overall IoU
· 2025-01-15
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Code
#10
C3VG
80.89
Overall IoU
· 2025-01-12
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Code
#11
GLEE-Pro
80
Overall IoU
· Extra Data
· 2023-12-14
General Object Foundation Model for Images and Videos at Scale
Code
#12
SegAgent
79.7
Overall IoU
· 2025-03-11
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Code
#13
MaskRIS (Swin-B, combined DB)
78.71
Overall IoU
· 2024-11-28
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
Code
#14
GROUNDHOG
78.5
Overall IoU
· Extra Data
· 2024-02-26
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
#15
SafaRi-B
77.21
Overall IoU
· 2024-07-02
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
#16
MaskRIS (Swin-B)
76.49
Overall IoU
· 2024-11-28
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
Code
#17
PolyFormer-L
SOTA
75.96
Overall IoU
· Extra Data
· 2023-02-14
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Code
#18
MagNet
75.24
Overall IoU
· 2023-12-19
Mask Grounding for Referring Image Segmentation
Code
#19
PolyFormer-B
74.82
Overall IoU
· Extra Data
· 2023-02-14
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Code
#20
ReLA
73.82
Overall IoU
· 2023-06-01
GRES: Generalized Referring Expression Segmentation
Code
#21
VPD
73.25
Overall IoU
· 2023-03-03
Unleashing Text-to-Image Diffusion Models for Visual Perception
Code
#22
VLT
SOTA
72.96
Overall IoU
· 2022-10-28
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
Code
#23
ETRIS
71.06
Overall IoU
· 2023-07-21
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Code
#24
RefTR
SOTA
70.56
Overall IoU
· 2021-06-06
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Code
#25
CRIS
70.47
Overall IoU
· 2021-11-30
CRIS: CLIP-Driven Referring Image Segmentation
Code
#26
MaIL
70.13
Overall IoU
· 2021-11-21
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
#27
VLT
65.65
Overall IoU
· 2021-08-12
Vision-Language Transformer and Query Generation for Referring Segmentation
Code
#28
SHNet
SOTA
65.32
Overall IoU
· 2021-04-21
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
Code
#29
CPMC
SOTA
61.36
Overall IoU
· 2020-10-01
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Code
#30
BRINet
61.35
Overall IoU
No paper
#31
RefVOS with BERT + MLM loss
59.45
Overall IoU
· 2020-10-01
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Code
#32
LANG2SEG
SOTA
58.9
Overall IoU
· 2019-10-10
Referring Expression Object Segmentation with Caption-Aware Consistency
Code
#33
RefVOS with BERT Pre-train
58.65
Overall IoU
· 2020-10-01
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Code
#34
CMSA
SOTA
58.32
Overall IoU
· 2019-04-09
Cross-Modal Self-Attention Network for Referring Image Segmentation
Code
#35
STEP (1-fold)
56.58
Overall IoU
No paper
#36
MattNet
SOTA
56.51
Overall IoU
· 2018-01-24
MAttNet: Modular Attention Network for Referring Expression Comprehension
Code