Metric: R@1,IoU=0.5 (higher is better)
| # | Model↕ | R@1,IoU=0.5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GVL (paragraph-level) | 60.67 | No | Learning Grounded Vision-Language Representation... | 2023-03-11 | Code |
| 2 | LLaVA-MR | 55.16 | No | LLaVA-MR: Large Language-and-Vision Assistant fo... | 2024-11-21 | Code |
| 3 | GVL | 49.18 | No | Learning Grounded Vision-Language Representation... | 2023-03-11 | Code |
| 4 | UnLoc-L | 48.3 | No | UnLoc: A Unified Framework for Video Localizatio... | 2023-08-21 | Code |
| 5 | UnLoc-B | 48 | No | UnLoc: A Unified Framework for Video Localizatio... | 2023-08-21 | Code |
| 6 | VLG-Net | 46.32 | No | VLG-Net: Video-Language Graph Matching Network f... | 2020-11-19 | Code |
| 7 | DRN | 45.45 | No | Dense Regression Network for Video Grounding | 2020-04-07 | Code |