Metric: Accuracy (%) (higher is better)
| # | Model↕ | Accuracy (%)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Florence-2-large-ft | 95.3 | Yes | Florence-2: Advancing a Unified Representation f... | 2023-11-10 | Code |
| 2 | mPLUG-2 | 92.8 | No | mPLUG-2: A Modularized Multi-modal Foundation Mo... | 2023-02-01 | Code |
| 3 | X2-VLM (large) | 92.1 | No | X$^2$-VLM: All-In-One Pre-trained Model For Visi... | 2022-11-22 | Code |
| 4 | XFM (base) | 90.4 | No | Toward Building General Foundation Models for La... | 2023-01-12 | Code |
| 5 | X2-VLM (base) | 90.3 | No | X$^2$-VLM: All-In-One Pre-trained Model For Visi... | 2022-11-22 | Code |
| 6 | X-VLM (base) | 89 | No | Multi-Grained Vision Language Pre-Training: Alig... | 2021-11-16 | Code |