Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video
/
Refer-YouTube-VOS
Video on Refer-YouTube-VOS
Metric: J&F (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
J&F
▼
Extra Data
Paper
Date
↕
Code
1
FindTrack
73.7
Yes
Find First, Track Next: Decoupling Identificatio...
2025-03-05
Code
2
GLEE-Pro
70.6
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
3
HyperSeg
68.5
Yes
HyperSeg: Towards Universal Visual Segmentation ...
2024-11-26
Code
4
GLEE-Plus
67.7
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
5
HTR
67.1
Yes
Temporally Consistent Referring Video Object Seg...
2024-03-28
Code
6
SOC
66
Yes
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
7
SgMg
65.7
Yes
Spectrum-guided Multi-granularity Referring Vide...
2023-07-25
Code
8
VATEX
65.4
No
Vision-Aware Text Features in Referring Image Se...
2024-04-12
Code
9
VLT
63.8
Yes
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
10
HTML-SwinL
63.4
Yes
-
-
-
11
HTML-Video-SwinB
63.4
Yes
-
-
-
12
ReferFormer (Large)
62.9
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
13
HTML-Video-SwinS
61.4
Yes
-
-
-
14
HTML-Video-SwinT
61.2
Yes
-
-
-
15
R2VOS (Swin-T)
60.2
No
Towards Robust Referring Video Object Segmentati...
2022-07-04
Code
16
HTML-ResNet101
58.5
Yes
-
-
-
17
HTML-ResNet50
57.8
Yes
-
-
-
18
CMSA
36.4
Yes
Cross-Modal Self-Attention Network for Referring...
2019-04-09
Code