Metric: recall@1 (higher is better)
| # | Model↕ | recall@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BLIP-2 ViT-G (fine-tuned) | 68.3 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 2 | VisualSparta | 68.2 | No | VisualSparta: An Embarrassingly Simple Approach ... | 2021-01-01 | Code |
| 3 | BLIP-2 ViT-L (fine-tuned) | 66.3 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 4 | FLAVA (zero-shot) | 38.38 | No | FLAVA: A Foundational Language And Vision Alignm... | 2021-12-08 | Code |
| 5 | CLIP (zero-shot) | 33.29 | No | FLAVA: A Foundational Language And Vision Alignm... | 2021-12-08 | Code |