Metric: Recall@5 (higher is better)
| # | Model↕ | Recall@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BLIP-2 ViT-G (zero-shot, 1K test set) | 98.1 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 2 | BLIP-2 ViT-L (zero-shot, 1K test set) | 97.6 | No | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 3 | MaMMUT (ours) | 96 | No | MaMMUT: A Simple Architecture for Joint Learning... | 2023-03-29 | Code |
| 4 | HADA | 95.94 | No | HADA: A Graph-based Amalgamation Framework in Im... | 2023-01-11 | Code |
| 5 | ALBEF | 95.3 | No | HADA: A Graph-based Amalgamation Framework in Im... | 2023-01-11 | Code |
| 6 | UNITER | 94.08 | No | HADA: A Graph-based Amalgamation Framework in Im... | 2023-01-11 | Code |
| 7 | LGSGM | 84.1 | No | A Deep Local and Global Scene-Graph Matching for... | 2021-06-04 | Code |
| 8 | GSMN | 82.3 | No | Graph Structured Network for Image-Text Matching | 2020-04-01 | Code |
| 9 | VisualSparta | 82 | No | VisualSparta: An Embarrassingly Simple Approach ... | 2021-01-01 | Code |