Metric: R@10 (higher is better)
| # | Model↕ | R@10▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GLIP | 98.1 | Yes | Grounded Language-Image Pre-training | 2021-12-07 | Code |
| 2 | FIBER-B | 97.6 | Yes | Coarse-to-Fine Vision-Language Pre-training with... | 2022-06-15 | Code |
| 3 | MDETR-ENB5 | 95.8 | Yes | MDETR -- Modulated Detection for End-to-End Mult... | 2021-04-26 | Code |
| 4 | VisualBERT | 86.51 | No | VisualBERT: A Simple and Performant Baseline for... | 2019-08-09 | Code |
| 5 | BAN (Bottom-Up detector) | 86.35 | No | Bilinear Attention Networks | 2018-05-21 | Code |
| 6 | CCA - Fast RCNN | 70.77 | No | Flickr30k Entities: Collecting Region-to-Phrase ... | 2015-05-19 | Code |
| 7 | DSPE | 68.66 | No | Learning Deep Structure-Preserving Image-Text Em... | 2015-11-19 | - |
| 8 | CCA - VGG19 | 67.15 | No | Flickr30k Entities: Collecting Region-to-Phrase ... | 2015-05-19 | Code |
| 9 | SCRC | 62.9 | No | Natural Language Object Retrieval | 2015-11-13 | Code |
| 10 | CCA | 59.66 | No | Flickr30k Entities: Collecting Region-to-Phrase ... | 2015-05-19 | Code |