Metric: GPT-4 score (human) (higher is better)
| # | Model↕ | GPT-4 score (human)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-4V-turbo-detail:high (Visual Prompt) | 59.9 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 2 | GPT-4V-turbo-detail:low (Visual Prompt) | 51.4 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 3 | LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt | 49 | Yes | Inst-IT: Boosting Multimodal Instance Understand... | 2024-12-04 | Code |
| 4 | ViP-LLaVA-13B (Visual Prompt) | 48.2 | No | Making Large Language Models Better Data Creators | 2023-10-31 | Code |
| 5 | LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt | 48.2 | Yes | Inst-IT: Boosting Multimodal Instance Understand... | 2024-12-04 | Code |
| 6 | LLaVA-1.5-13B (Visual Prompt) | 42.9 | No | Improved Baselines with Visual Instruction Tuning | 2023-10-05 | Code |
| 7 | Qwen-VL-Chat (Visual Prompt) | 41.7 | No | Qwen-VL: A Versatile Vision-Language Model for U... | 2023-08-24 | Code |
| 8 | InstructBLIP-13B (Visual Prompt) | 35.2 | No | InstructBLIP: Towards General-purpose Vision-Lan... | 2023-05-11 | Code |