Metric: WUPS (higher is better)
| # | Model↕ | WUPS▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaLI-X | 38.3 | Yes | PaLI-X: On Scaling up a Multilingual Vision and ... | 2023-05-29 | Code |
| 2 | PaLI-3 | 37.7 | Yes | PaLI-3 Vision Language Models: Smaller, Faster, ... | 2023-10-13 | Code |
| 3 | R2A | 34.7 | Yes | Retrieving-to-Answer: Zero-Shot Video Question A... | 2023-06-15 | - |
| 4 | Flamingo(32-shot) | 33.5 | Yes | Flamingo: a Visual Language Model for Few-Shot L... | 2022-04-29 | Code |
| 5 | Gemini Ultra (zero-shot) | 29.9 | No | Gemini: A Family of Highly Capable Multimodal Mo... | 2023-12-19 | Code |
| 6 | Gemini Pro (zero-shot) | 28 | No | Gemini: A Family of Highly Capable Multimodal Mo... | 2023-12-19 | Code |
| 7 | Flamingo(0-shot) | 26.7 | Yes | Flamingo: a Visual Language Model for Few-Shot L... | 2022-04-29 | Code |
| 8 | Emu(0-shot) | 23.4 | Yes | Emu: Generative Pretraining in Multimodality | 2023-07-11 | Code |