Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Instance Segmentation
/
Refer-YouTube-VOS (2021 public validation)
Instance Segmentation on Refer-YouTube-VOS (2021 public validation)
Metric: J (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
J
▼
Extra Data
Paper
Date
↕
Code
1
MPG-SAM 2
71.7
No
MPG-SAM 2: Adapting SAM 2 with Mask Priors and G...
2025-01-23
Code
2
VRS-HQ (Chat-UniVi-13B)
69
No
The Devil is in Temporal Token: High Quality Vid...
2025-01-15
Code
3
GLEE-Pro
68.2
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
4
UNINEXT-H
67.6
No
Universal Instance Perception as Object Discover...
2023-03-12
Code
5
ReferDINO (Swin-B)
67
No
ReferDINO: Referring Video Object Segmentation w...
2025-01-24
-
6
MUTR
66.4
No
Referred by Multi-Modality: A Unified Temporal T...
2023-05-25
Code
7
UniRef-L (Swin-L)
65.5
No
-
-
-
8
VLP (VLMo-L)
65.3
No
Harnessing Vision-Language Pretrained Models wit...
2024-05-17
-
9
SOC (Joint training, Video-Swin-B)
65.3
No
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
10
HTR (Pre-training)
65.3
No
Temporally Consistent Referring Video Object Seg...
2024-03-28
Code
11
DsHmp (Video-Swin-Base)
65
No
Decoupling Static and Hierarchical Motion Percep...
2024-04-04
Code
12
UniRef++-L
64.8
No
UniRef++: Segment Every Reference Object in Spat...
2023-12-25
Code
13
ViLLa
64.6
No
ViLLa: Video Reasoning Segmentation with Large L...
2024-07-18
Code
14
GroPrompt
64.1
No
GroPrompt: Efficient Grounded Prompting and Adap...
2024-06-18
-
15
SgMg (Pre-training)
63.9
No
Spectrum-guided Multi-granularity Referring Vide...
2023-07-25
Code
16
EPCFormer (ViT-H)
62.9
No
Expression Prompt Collaboration Transformer for ...
2023-08-08
-
17
UniLSeg-100
62.8
No
Universal Segmentation at Arbitrary Granularity ...
2023-12-04
Code
18
LoSh-R
62.5
Yes
LoSh: Long-Short Text Joint Prediction Network f...
2023-06-14
Code
19
VLT
61.9
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
20
OnlineRefer (Swin-L, online)
61.6
No
OnlineRefer: A Simple Online Baseline for Referr...
2023-07-18
Code
21
R2VOS (Video-Swin-T)
59.6
Yes
Towards Robust Referring Video Object Segmentati...
2022-07-04
Code
22
SOC (Video-Swin-T)
57.8
No
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
23
UniVS(Swin-L)
56.8
Yes
UniVS: Unified and Universal Video Segmentation ...
2024-02-28
Code
24
ReferFormer (ResNet-101)
56.1
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
25
ReferFormer (ResNet-50)
54.8
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
26
MANET
54.75
No
Multi-Attention Network for Compressed Video Ref...
2022-07-26
Code
27
MTTR (w=12)
54
No
End-to-End Referring Video Object Segmentation w...
2021-11-29
Code
28
MLRLSA
50.96
No
-
-
-
29
Locater
48.8
No
Local-Global Context Aware Transformer for Langu...
2022-03-18
Code
30
VLIDE
48.44
No
Deeply Interleaved Two-Stream Encoder for Referr...
2022-03-30
-
31
URVOS
47
No
-
-
Code