Video Object Segmentation on MeViS

Metric: J&F (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	J&F▼	Extra Data	Paper	Date↕	Code
1	MPG-SAM 2	53.7	No	MPG-SAM 2: Adapting SAM 2 with Mask Priors and G...	2025-01-23	Code
2	FindTrack	53.2	No	Find First, Track Next: Decoupling Identificatio...	2025-03-05	Code
3	GLUS	51.3	No	GLUS: Global-Local Reasoning Unified into A Sing...	2025-04-10	Code
4	VRS-HQ (Chat-UniVi-13B)	50.9	No	The Devil is in Temporal Token: High Quality Vid...	2025-01-15	Code
5	ReferDINO (Swin-B)	49.3	No	ReferDINO: Referring Video Object Segmentation w...	2025-01-24	-
6	SAMWISE	48.3	No	SAMWISE: Infusing Wisdom in SAM2 for Text-Driven...	2024-11-26	Code
7	DsHmp + MTCM	47.6	No	Multi-Context Temporal Consistent Modeling for R...	2025-01-09	Code
8	DsHmp	46.4	No	Decoupling Static and Hierarchical Motion Percep...	2024-04-04	Code
9	HTR	42.7	No	Temporally Consistent Referring Video Object Seg...	2024-03-28	Code
10	LMPM	37.2	No	MeViS: A Large-scale Benchmark for Video Segment...	2023-08-16	Code
11	VLT+TC	35.5	No	VLT: Vision-Language Transformer and Query Gener...	2022-10-28	Code
12	InternVideo2.5	32	No	InternVideo2.5: Empowering Video MLLMs with Long...	2025-01-21	Code
13	ReferFormer	31	No	Language as Queries for Referring Video Object S...	2022-01-03	Code
14	MTTR	30	No	End-to-End Referring Video Object Segmentation w...	2021-11-29	Code
15	LBDT	29.3	No	Language-Bridged Spatial-Temporal Interaction fo...	2022-06-08	Code
16	URVOS	27.8	No	-	-	Code

#1MPG-SAM 2SOTA
53.7
J&F· 2025-01-23
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation Code
#2FindTrack
53.2
J&F· 2025-03-05
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation Code
#3GLUS
51.3
J&F· 2025-04-10
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Code
#4VRS-HQ (Chat-UniVi-13B)SOTA
50.9
J&F· 2025-01-15
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation Code
#5ReferDINO (Swin-B)
49.3
J&F· 2025-01-24
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
#6SAMWISESOTA
48.3
J&F· 2024-11-26
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation Code
#7DsHmp + MTCM
47.6
J&F· 2025-01-09
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation Code
#8DsHmpSOTA
46.4
J&F· 2024-04-04
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation Code
#9HTRSOTA
42.7
J&F· 2024-03-28
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory Code
#10LMPMSOTA
37.2
J&F· 2023-08-16
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions Code
#11VLT+TCSOTA
35.5
J&F· 2022-10-28
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation Code
#12InternVideo2.5
32
J&F· 2025-01-21
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling Code
#13ReferFormerSOTA
31
J&F· 2022-01-03
Language as Queries for Referring Video Object Segmentation Code
#14MTTRSOTA
30
J&F· 2021-11-29
End-to-End Referring Video Object Segmentation with Multimodal Transformers Code
#15LBDT
29.3
J&F· 2022-06-08
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation Code
#16URVOS
27.8
J&F
No paperCode