Metric: Recall@10 (higher is better)
| # | Model↕ | Recall@10▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BLIP-2 ViT-G (zero-shot, 1K test set) | 98.9 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 2 | BLIP-2 ViT-L (zero-shot, 1K test set) | 98.9 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 3 | HADA | 98.02 | No | HADA: A Graph-based Amalgamation Framework in Im... | 2023-01-11 | Code |
| 4 | MaMMUT (ours) | 98 | No | MaMMUT: A Simple Architecture for Joint Learning... | 2023-03-29 | Code |
| 5 | ALBEF | 97.72 | No | HADA: A Graph-based Amalgamation Framework in Im... | 2023-01-11 | Code |
| 6 | UNITER | 96.76 | No | HADA: A Graph-based Amalgamation Framework in Im... | 2023-01-11 | Code |
| 7 | LGSGM | 90.2 | No | A Deep Local and Global Scene-Graph Matching for... | 2021-06-04 | Code |
| 8 | GSMN | 89 | No | Graph Structured Network for Image-Text Matching | 2020-04-01 | Code |
| 9 | VisualSparta | 88.1 | No | VisualSparta: An Embarrassingly Simple Approach ... | 2021-01-01 | Code |