Metric: 10 Images, 1*1 Stitching, Exact Accuracy (higher is better)
| # | Model↕ | 10 Images, 1*1 Stitching, Exact Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-4o | 97 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 2 | Gemini Pro 1.5 | 89.94 | No | Gemini 1.5: Unlocking multimodal understanding a... | 2024-03-08 | Code |
| 3 | GPT-4V | 72.36 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 4 | Claude 3 Opus | 66.93 | No | - | - | - |
| 5 | Gemini Pro 1.0 | 16.25 | No | Gemini: A Family of Highly Capable Multimodal Mo... | 2023-12-19 | Code |
| 6 | mPLUG-Owl-v2 | 0.4 | No | mPLUG-Owl2: Revolutionizing Multi-modal Large La... | 2023-11-07 | Code |
| 7 | LLaVA-Llama-3 | 0 | No | - | - | Code |
| 8 | IDEFICS2-8B | 0 | No | - | - | - |
| 9 | InstructBLIP-Flan-T5-XXL | 0 | No | - | - | Code |
| 10 | CogVLM2-Llama-3 | 0 | No | - | - | Code |
| 11 | CogVLM-17B | 0 | No | - | - | Code |
| 12 | InstructBLIP-Vicuna-13B | 0 | No | - | - | Code |