Metric: mean average precision (higher is better)
| # | Model↕ | mean average precision▼ | Augmentations | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | X-VLM | 28 | Yes | Multi-Grained Vision Language Pre-Training: Alig... | 2021-11-16 | Code |
| 2 | BLIP 2 (pretrained) | 25.5 | Yes | BLIP-2: Bootstrapping Language-Image Pre-trainin... | 2023-01-30 | Code |
| 3 | BLIP | 24.3 | Yes | BLIP: Bootstrapping Language-Image Pre-training ... | 2022-01-28 | Code |
| 4 | OVAD-Baseline-Box | 21.4 | No | Open-vocabulary Attribute Detection | 2022-11-23 | Code |
| 5 | ALBEF | 21 | Yes | Align before Fuse: Vision and Language Represent... | 2021-07-16 | Code |
| 6 | Open CLIP ViT-B32 | 17 | Yes | Reproducible scaling laws for contrastive langua... | 2022-12-14 | Code |
| 7 | CLIP VIT-B16 | 16.6 | Yes | Learning Transferable Visual Models From Natural... | 2021-02-26 | Code |