Metric: J (higher is better)
| # | Model↕ | J▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VRS-HQ (Chat-UniVi-13B) | 57.6 | No | The Devil is in Temporal Token: High Quality Vid... | 2025-01-15 | Code |
| 2 | VRS-HQ (Chat-UniVi-7B) | 56.6 | No | The Devil is in Temporal Token: High Quality Vid... | 2025-01-15 | Code |
| 3 | VISA (Chat-UniVi-13B) | 48.8 | No | VISA: Reasoning Video Object Segmentation via La... | 2024-07-16 | Code |
| 4 | VISA (Chat-UniVi-7B) | 44.9 | No | VISA: Reasoning Video Object Segmentation via La... | 2024-07-16 | Code |
| 5 | TrackGPT (LLaVA-13B) | 43.2 | No | Tracking with Human-Intent Reasoning | 2023-12-29 | Code |
| 6 | LISA (LLaVA-13B) | 39.8 | No | LISA: Reasoning Segmentation via Large Language ... | 2023-08-01 | Code |
| 7 | ReferFormer (Video-Swin-B) | 26.2 | No | Language as Queries for Referring Video Object S... | 2022-01-03 | Code |
| 8 | MTTR (Video-Swin-T) | 25.1 | No | End-to-End Referring Video Object Segmentation w... | 2021-11-29 | Code |
| 9 | LMPM (Swin-T) | 21.2 | No | MeViS: A Large-scale Benchmark for Video Segment... | 2023-08-16 | Code |