Metric: R@5 (higher is better)
| # | Model↕ | R@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GLIP | 96.9 | Yes | Grounded Language-Image Pre-training | 2021-12-07 | Code |
| 2 | FIBER-B | 96.4 | Yes | Coarse-to-Fine Vision-Language Pre-training with... | 2022-06-15 | Code |
| 3 | MDETR-ENB5 | 93.9 | Yes | MDETR -- Modulated Detection for End-to-End Mult... | 2021-04-26 | Code |
| 4 | VisualBERT | 84.98 | No | VisualBERT: A Simple and Performant Baseline for... | 2019-08-09 | Code |
| 5 | BAN (Bottom-Up detector) | 84.22 | No | Bilinear Attention Networks | 2018-05-21 | Code |
| 6 | CCA - Fast RCNN | 64.52 | No | Flickr30k Entities: Collecting Region-to-Phrase ... | 2015-05-19 | Code |
| 7 | DSPE | 64.46 | No | Learning Deep Structure-Preserving Image-Text Em... | 2015-11-19 | - |
| 8 | CCA - VGG19 | 58.01 | No | Flickr30k Entities: Collecting Region-to-Phrase ... | 2015-05-19 | Code |