Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BLIP-2 ViT-G OPT 6.7B (fine-tuned) | 82.19 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 2 | BLIP-2 ViT-G OPT 2.7B (fine-tuned) | 81.59 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 3 | BLIP-2 ViT-G FlanT5 XL (fine-tuned) | 81.55 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 4 | LocVLM-L | 55.9 | No | Learning to Localize Objects Improves Spatial Re... | 2024-04-11 | Code |