Metric: Recall@10 (higher is better)
| # | Model↕ | Recall@10▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Oscar | 99.8 | No | Oscar: Object-Semantics Aligned Pre-training for... | 2020-04-13 | Code |
| 2 | BLIP-2 (ViT-G, fine-tuned) | 98.5 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 3 | ONE-PEACE (ViT-G, w/o ranking) | 98.3 | No | ONE-PEACE: Exploring One General Representation ... | 2023-05-18 | Code |
| 4 | BLIP-2 (ViT-L, fine-tuned) | 98 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 5 | Unicoder-VL | 97.2 | No | Unicoder-VL: A Universal Encoder for Vision and ... | 2019-08-16 | - |
| 6 | IAIS | 94.48 | No | Learning Relation Alignment for Calibrated Cross... | 2021-05-28 | Code |
| 7 | CLIP (zero-shot) | 88.1 | No | Learning Transferable Visual Models From Natural... | 2021-02-26 | Code |
| 8 | DVSA | 74.8 | No | Deep Visual-Semantic Alignments for Generating I... | 2014-12-07 | Code |