Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Object Segmentation
/
MeViS
Video Object Segmentation on MeViS
Metric: J (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
J (best first)
J (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
J
▼
Extra Data
Paper
Date
↕
Code
1
MPG-SAM 2
50.7
No
MPG-SAM 2: Adapting SAM 2 with Mask Priors and G...
2025-01-23
Code
2
FindTrack
50.5
No
Find First, Track Next: Decoupling Identificatio...
2025-03-05
Code
3
GLUS
48.5
No
GLUS: Global-Local Reasoning Unified into A Sing...
2025-04-10
Code
4
VRS-HQ (Chat-UniVi-13B)
48
No
The Devil is in Temporal Token: High Quality Vid...
2025-01-15
Code
5
SAMWISE
45.4
No
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven...
2024-11-26
Code
6
ReferDINO (Swin-B)
44.7
No
ReferDINO: Referring Video Object Segmentation w...
2025-01-24
-
7
DsHmp + MTCM
44.1
No
Multi-Context Temporal Consistent Modeling for R...
2025-01-09
Code
8
DsHmp
43
No
Decoupling Static and Hierarchical Motion Percep...
2024-04-04
Code
9
HTR
39.9
No
Temporally Consistent Referring Video Object Seg...
2024-03-28
Code
10
LMPM
34.2
No
MeViS: A Large-scale Benchmark for Video Segment...
2023-08-16
Code
11
VLT+TC
33.6
No
VLT: Vision-Language Transformer and Query Gener...
2022-10-28
Code
12
ReferFormer
29.8
No
Language as Queries for Referring Video Object S...
2022-01-03
Code
13
MTTR
28.8
No
End-to-End Referring Video Object Segmentation w...
2021-11-29
Code
14
LBDT
27.8
No
Language-Bridged Spatial-Temporal Interaction fo...
2022-06-08
Code
15
URVOS
25.7
No
-
-
Code
#1
MPG-SAM 2
SOTA
50.7
J
· 2025-01-23
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Code
#2
FindTrack
50.5
J
· 2025-03-05
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Code
#3
GLUS
48.5
J
· 2025-04-10
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Code
#4
VRS-HQ (Chat-UniVi-13B)
SOTA
48
J
· 2025-01-15
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Code
#5
SAMWISE
SOTA
45.4
J
· 2024-11-26
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Code
#6
ReferDINO (Swin-B)
44.7
J
· 2025-01-24
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
#7
DsHmp + MTCM
44.1
J
· 2025-01-09
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation
Code
#8
DsHmp
SOTA
43
J
· 2024-04-04
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Code
#9
HTR
SOTA
39.9
J
· 2024-03-28
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
Code
#10
LMPM
SOTA
34.2
J
· 2023-08-16
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Code
#11
VLT+TC
SOTA
33.6
J
· 2022-10-28
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
Code
#12
ReferFormer
SOTA
29.8
J
· 2022-01-03
Language as Queries for Referring Video Object Segmentation
Code
#13
MTTR
SOTA
28.8
J
· 2021-11-29
End-to-End Referring Video Object Segmentation with Multimodal Transformers
Code
#14
LBDT
27.8
J
· 2022-06-08
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
Code
#15
URVOS
25.7
J
No paper
Code