Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video
/
Refer-YouTube-VOS
Video on Refer-YouTube-VOS
Metric: F (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
F
▼
Extra Data
Paper
Date
↕
Code
1
FindTrack
75.7
Yes
Find First, Track Next: Decoupling Identificatio...
2025-03-05
Code
2
GLEE-Pro
72.9
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
3
GLEE-Plus
69.7
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
4
HTR
68.9
Yes
Temporally Consistent Referring Video Object Seg...
2024-03-28
Code
5
SOC
67.9
Yes
SOC: Semantic-Assisted Object Cluster for Referr...
2023-05-26
Code
6
VATEX
67.5
No
Vision-Aware Text Features in Referring Image Se...
2024-04-12
Code
7
SgMg
67.4
Yes
Spectrum-guided Multi-granularity Referring Vide...
2023-07-25
Code
8
VLT
65.6
Yes
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
9
HTML-SwinL
65.3
Yes
-
-
-
10
HTML-Video-SwinB
65.2
Yes
-
-
-
11
ReferFormer (Large)
64.6
Yes
Language as Queries for Referring Video Object S...
2022-01-03
Code
12
HTML-Video-SwinT
63
Yes
-
-
-
13
HTML-Video-SwinS
62.9
Yes
-
-
-
14
R2VOS (Swin-T)
61.5
No
Towards Robust Referring Video Object Segmentati...
2022-07-04
Code
15
HTML-ResNet101
59.8
Yes
-
-
-
16
HTML-ResNet50
59
Yes
-
-
-
17
CMSA
38.1
Yes
Cross-Modal Self-Attention Network for Referring...
2019-04-09
Code