Metric: Recall@5 (higher is better)
| # | Model↕ | Recall@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BLIP-2 (ViT-G, fine-tuned) | 97 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 2 | ONE-PEACE (ViT-G, w/o ranking) | 96.3 | No | ONE-PEACE: Exploring One General Representation ... | 2023-05-18 | Code |
| 3 | BLIP-2 (ViT-L, fine-tuned) | 96 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 4 | IAIS | 89.7 | No | Learning Relation Alignment for Calibrated Cross... | 2021-05-28 | Code |
| 5 | CLIP (zero-shot) | 81.5 | No | Learning Transferable Visual Models From Natural... | 2021-02-26 | Code |
| 6 | FLAVA (ViT-B, zero-shot) | 76.76 | No | FLAVA: A Foundational Language And Vision Alignm... | 2021-12-08 | Code |