Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Referring Expression Segmentation
/
Refer-YouTube-VOS (2021 public validation)
Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation)
Metric: J&F (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
J&F
▼
Extra Data
Paper
Date
↕
Code
1
MPG-SAM 2
73.9
No
MPG-SAM 2: Adapting SAM 2 with Mask Priors and G...
2025-01-23
Code
2
VRS-HQ (Chat-UniVi-13B)
71
No
The Devil is in Temporal Token: High Quality Vid...
2025-01-15
Code
3
GLEE-Pro
70.6
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
4
UNINEXT-H
70.1
No
Universal Instance Perception as Object Discover...
2023-03-12
Code
5
ReferDINO (Swin-B)
69.3
No
ReferDINO: Referring Video Object Segmentation w...
2025-01-24
-
6
MUTR
68.4
No
Referred by Multi-Modality: A Unified Temporal T...
2023-05-25
Code
7
VLP (VLMo-L)
67.6
No
Harnessing Vision-Language Pretrained Models wit...
2024-05-17
-
8
UniRef-L (Swin-L)
67.4
No
-
-
-
9
HTR (Pre-training)
67.1
No
Temporally Consistent Referring Video Object Seg...
2024-03-28
Code
10
DsHmp (Video-Swin-Base)
67.1
No
Decoupling Static and Hierarchical Motion Percep...
2024-04-04
Code
11
UniRef++-L
66.9
No
UniRef++: Segment Every Reference Object in Spat...
2023-12-25
Code
12
ViLLa
66.5
No
ViLLa: Video Reasoning Segmentation with Large L...
2024-07-18
Code
13
DEVA (ReferFormer)
66
Yes
Tracking Anything with Decoupled Video Segmentat...
2023-09-07
Code
14
SgMg (Pre-training)
65.7
No
Spectrum-guided Multi-granularity Referring Vide...
2023-07-25
Code
15
GroPrompt
65.5
No
GroPrompt: Efficient Grounded Prompting and Adap...
2024-06-18
-
16
EPCFormer (ViT-H)
65
No
Expression Prompt Collaboration Transformer for ...
2023-08-08
-
17
UniLSeg-100
64.9
No
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
18
LoSh-R
64.2
Yes
LoSh: Long-Short Text Joint Prediction Network f...
2023-06-14
Code
19
VLT
63.8
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
20
OnlineRefer (Swin-L, online)
63.5
No
OnlineRefer: A Simple Online Baseline for Referr...
2023-07-18
Code
21
R2VOS (Video-Swin-T)
61.3
Yes
Towards Robust Referring Video Object Segmentati...
2022-07-04
Code
22
SOC (Video-Swin-T)
59.2
No
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
23
UniVS(Swin-L)
58
Yes
UniVS: Unified and Universal Video Segmentation ...
2024-02-28
Code
24
ReferFormer (ResNet-101)
57.3
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
25
MANET
55.63
No
Multi-Attention Network for Compressed Video Ref...
2022-07-26
Code
26
ReferFormer (ResNet-50)
55.6
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
27
MTTR (w=12)
55.32
No
End-to-End Referring Video Object Segmentation w...
2021-11-29
Code
28
Locater
50
No
Local-Global Context Aware Transformer for Langu...
2022-03-18
Code
29
MLRLSA
49.7
No
-
-
-
30
VLIDE
49.56
No
Deeply Interleaved Two-Stream Encoder for Referr...
2022-03-30
-
31
URVOS
48.9
No
-
-
Code
32
InternVideo2.5
34.2
No
InternVideo2.5: Empowering Video MLLMs with Long...
2025-01-21
Code