Video on MeViS

Metric: F (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	F▼	Extra Data	Paper	Date↕	Code
1	MPG-SAM 2	56.7	No	MPG-SAM 2: Adapting SAM 2 with Mask Priors and G...	2025-01-23	Code
2	FindTrack	55.9	No	Find First, Track Next: Decoupling Identificatio...	2025-03-05	Code
3	GLUS	54.2	No	GLUS: Global-Local Reasoning Unified into A Sing...	2025-04-10	Code
4	ReferDINO (Swin-B)	53.9	No	ReferDINO: Referring Video Object Segmentation w...	2025-01-24	-
5	VRS-HQ (Chat-UniVi-13B)	53.7	No	The Devil is in Temporal Token: High Quality Vid...	2025-01-15	Code
6	SAMWISE	51.2	No	SAMWISE: Infusing Wisdom in SAM2 for Text-Driven...	2024-11-26	Code
7	DsHmp + MTCM	51.1	No	Multi-Context Temporal Consistent Modeling for R...	2025-01-09	Code
8	DsHmp	49.8	No	Decoupling Static and Hierarchical Motion Percep...	2024-04-04	Code
9	HTR	45.5	No	Temporally Consistent Referring Video Object Seg...	2024-03-28	Code
10	LMPM	40.2	No	MeViS: A Large-scale Benchmark for Video Segment...	2023-08-16	Code
11	VLT+TC	37.3	No	VLT: Vision-Language Transformer and Query Gener...	2022-10-28	Code
12	ReferFormer	32.2	No	Language as Queries for Referring Video Object S...	2022-01-03	Code
13	MTTR	31.2	No	End-to-End Referring Video Object Segmentation w...	2021-11-29	Code
14	LBDT	30.8	No	Language-Bridged Spatial-Temporal Interaction fo...	2022-06-08	Code
15	URVOS	29.9	No	-	-	Code

#1MPG-SAM 2SOTA
56.7
F· 2025-01-23
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation Code
#2FindTrack
55.9
F· 2025-03-05
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation Code
#3GLUS
54.2
F· 2025-04-10
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Code
#4ReferDINO (Swin-B)
53.9
F· 2025-01-24
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
#5VRS-HQ (Chat-UniVi-13B)SOTA
53.7
F· 2025-01-15
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation Code
#6SAMWISESOTA
51.2
F· 2024-11-26
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation Code
#7DsHmp + MTCM
51.1
F· 2025-01-09
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation Code
#8DsHmpSOTA
49.8
F· 2024-04-04
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation Code
#9HTRSOTA
45.5
F· 2024-03-28
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory Code
#10LMPMSOTA
40.2
F· 2023-08-16
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions Code
#11VLT+TCSOTA
37.3
F· 2022-10-28
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation Code
#12ReferFormerSOTA
32.2
F· 2022-01-03
Language as Queries for Referring Video Object Segmentation Code
#13MTTRSOTA
31.2
F· 2021-11-29
End-to-End Referring Video Object Segmentation with Multimodal Transformers Code
#14LBDT
30.8
F· 2022-06-08
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation Code
#15URVOS
29.9
F
No paperCode