Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Referring Expression Segmentation
/
Refer-YouTube-VOS (2021 public validation)
Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation)
Metric: F (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
F
▼
Extra Data
Paper
Date
↕
Code
1
MPG-SAM 2
76.1
No
MPG-SAM 2: Adapting SAM 2 with Mask Priors and G...
2025-01-23
Code
2
VRS-HQ (Chat-UniVi-13B)
73.1
No
The Devil is in Temporal Token: High Quality Vid...
2025-01-15
Code
3
GLEE-Pro
72.9
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
4
UNINEXT-H
72.7
No
Universal Instance Perception as Object Discover...
2023-03-12
Code
5
ReferDINO (Swin-B)
71.5
No
ReferDINO: Referring Video Object Segmentation w...
2025-01-24
-
6
MUTR
70.4
No
Referred by Multi-Modality: A Unified Temporal T...
2023-05-25
Code
7
VLP (VLMo-L)
69.8
No
Harnessing Vision-Language Pretrained Models wit...
2024-05-17
-
8
SOC (Joint training, Video-Swin-B)
69.3
No
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
9
UniRef-L (Swin-L)
69.2
No
-
-
-
10
DsHmp (Video-Swin-Base)
69.1
No
Decoupling Static and Hierarchical Motion Percep...
2024-04-04
Code
11
UniRef++-L
69
No
UniRef++: Segment Every Reference Object in Spat...
2023-12-25
Code
12
HTR (Pre-training)
68.9
No
Temporally Consistent Referring Video Object Seg...
2024-03-28
Code
13
ViLLa
68.6
No
ViLLa: Video Reasoning Segmentation with Large L...
2024-07-18
Code
14
SgMg (Pre-training)
67.4
No
Spectrum-guided Multi-granularity Referring Vide...
2023-07-25
Code
15
EPCFormer (ViT-H)
67.2
No
Expression Prompt Collaboration Transformer for ...
2023-08-08
-
16
UniLSeg-100
67
No
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
17
GroPrompt
66.9
No
GroPrompt: Efficient Grounded Prompting and Adap...
2024-06-18
-
18
LoSh-R
66
Yes
LoSh: Long-Short Text Joint Prediction Network f...
2023-06-14
Code
19
VLT
65.6
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
20
OnlineRefer (Swin-L, online)
65.5
No
OnlineRefer: A Simple Online Baseline for Referr...
2023-07-18
Code
21
R2VOS (Video-Swin-T)
63.1
Yes
Towards Robust Referring Video Object Segmentati...
2022-07-04
Code
22
SOC (Video-Swin-T)
60.5
No
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
23
UniVS(Swin-L)
59.5
Yes
UniVS: Unified and Universal Video Segmentation ...
2024-02-28
Code
24
ReferFormer (ResNet-101)
58.4
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
25
MTTR (w=12)
56.64
No
End-to-End Referring Video Object Segmentation w...
2021-11-29
Code
26
ReferFormer (ResNet-50)
56.6
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
27
MANET
56.51
No
Multi-Attention Network for Compressed Video Ref...
2022-07-26
Code
28
Locater
51.1
No
Local-Global Context Aware Transformer for Langu...
2022-03-18
Code
29
URVOS
50.8
No
-
-
Code
30
VLIDE
50.67
No
Deeply Interleaved Two-Stream Encoder for Referr...
2022-03-30
-
31
MLRLSA
48.43
No
-
-
-