Metric: R (higher is better)
| # | Model↕ | R▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VRS-HQ (Chat-UniVi-7B) | 19.7 | No | The Devil is in Temporal Token: High Quality Vid... | 2025-01-15 | Code |
| 2 | VRS-HQ (Chat-UniVi-13B) | 18.9 | No | The Devil is in Temporal Token: High Quality Vid... | 2025-01-15 | Code |
| 3 | VISA (Chat-UniVi-7B) | 15.5 | No | VISA: Reasoning Video Object Segmentation via La... | 2024-07-16 | Code |
| 4 | VISA (Chat-UniVi-13B) | 14.5 | No | VISA: Reasoning Video Object Segmentation via La... | 2024-07-16 | Code |
| 5 | TrackGPT (LLaVA-13B) | 12.8 | No | Tracking with Human-Intent Reasoning | 2023-12-29 | Code |
| 6 | ReferFormer (Video-Swin-B) | 8.8 | No | Language as Queries for Referring Video Object S... | 2022-01-03 | Code |
| 7 | LISA (LLaVA-13B) | 8.6 | No | LISA: Reasoning Segmentation via Large Language ... | 2023-08-01 | Code |
| 8 | MTTR (Video-Swin-T) | 5.6 | No | End-to-End Referring Video Object Segmentation w... | 2021-11-29 | Code |
| 9 | LMPM (Swin-T) | 3.2 | No | MeViS: A Large-scale Benchmark for Video Segment... | 2023-08-16 | Code |