Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | RA-VQAv2 w/ PreFLMR | 30.65 | No | PreFLMR: Scaling Up Fine-Grained Late-Interactio... | 2024-02-13 | Code |
| 2 | PaLI-X | 24 | No | PaLI-X: On Scaling up a Multilingual Vision and ... | 2023-05-29 | Code |
| 3 | CLIP + FiD | 20.9 | No | Can Pre-trained Vision and Language Models Answe... | 2023-02-23 | Code |
| 4 | CLIP + PaLM (540B) | 20.4 | No | Can Pre-trained Vision and Language Models Answe... | 2023-02-23 | Code |
| 5 | PaLI | 19.7 | No | Can Pre-trained Vision and Language Models Answe... | 2023-02-23 | Code |
| 6 | BLIP2 | 14.6 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 7 | InstructBLIP | 14.5 | No | - | - | - |