Metric: Generation (higher is better)
| # | Model↕ | Generation▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | SoFar | 70.88 | No | SoFar: Language-Grounded Orientation Bridges Spa... | 2025-02-18 | Code |
| 2 | Qwen-VL-Max | 49.11 | No | Qwen-VL: A Versatile Vision-Language Model for U... | 2023-08-24 | Code |
| 3 | GPT-4V | 36.07 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 4 | LLaVA-1.6 | 35.19 | No | Visual Instruction Tuning | 2023-04-17 | Code |
| 5 | MiniGPT4 | 23.54 | No | MiniGPT-4: Enhancing Vision-Language Understandi... | 2023-04-20 | Code |