Metric: Accuracy (%) (higher is better)
| # | Model↕ | Accuracy (%)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Florence-2-large-ft | 93.4 | Yes | Florence-2: Advancing a Unified Representation f... | 2023-11-10 | Code |
| 2 | mPLUG-2 | 90.33 | No | mPLUG-2: A Modularized Multi-modal Foundation Mo... | 2023-02-01 | Code |
| 3 | X2-VLM (large) | 87.6 | No | X$^2$-VLM: All-In-One Pre-trained Model For Visi... | 2022-11-22 | Code |
| 4 | XFM (base) | 86.1 | No | Toward Building General Foundation Models for La... | 2023-01-12 | Code |
| 5 | X2-VLM (base) | 85.2 | No | X$^2$-VLM: All-In-One Pre-trained Model For Visi... | 2022-11-22 | Code |
| 6 | X-VLM (base) | 84.51 | No | Multi-Grained Vision Language Pre-Training: Alig... | 2021-11-16 | Code |