Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | LLaVA-OneVision7B w. FOCUS | 92.15 | No | - | - | - |
| 2 | LLaVA-OneVision7B w. ZoomEye | 90.58 | No | ZoomEye: Enhancing Multimodal LLMs with Human-Li... | 2024-11-25 | Code |
| 3 | IVM-Enhanced GPT4-V | 81.2 | No | Instruction-Guided Visual Masking | 2024-05-30 | Code |
| 4 | SEAL | 75.39 | Yes | V*: Guided Visual Search as a Core Mechanism in ... | 2023-12-21 | Code |
| 5 | LLaVA-OneVision7B | 74.46 | No | LLaVA-OneVision: Easy Visual Task Transfer | 2024-08-06 | Code |