Visual Question Answering (VQA) on AI2D

Metric: EM (higher is better)

LeaderboardDataset
Loading chart...
#ModelEMExtra DataPaperDateCode
1SMoLA-PaLI-X Specialist Model82.5YesOmni-SMoLA: Boosting Generalist Multimodal Model...2023-12-01-
2SMoLA-PaLI-X Generalist Model81.4YesOmni-SMoLA: Boosting Generalist Multimodal Model...2023-12-01-
3Gemini Ultra79.5NoGemini: A Family of Highly Capable Multimodal Mo...2023-12-19Code
4DUBLIN51.11NoDUBLIN -- Document Understanding By Language-Ima...2023-05-23-