Metric: Image Context (higher is better)
| # | Model↕ | Image Context▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | MC-CoT F-Large | 93.75 | No | Boosting the Power of Small Multimodal Reasoning... | 2023-11-23 | Code |
| 2 | Honeybee | 93.75 | Yes | Honeybee: Locality-enhanced Projector for Multim... | 2023-12-11 | Code |
| 3 | Multimodal CoT | 88.8 | No | Multimodal Chain-of-Thought Reasoning in Languag... | 2023-02-02 | Code |
| 4 | Chat-UniVi-13B | 88.05 | Yes | Chat-UniVi: Unified Visual Representation Empowe... | 2023-11-14 | Code |
| 5 | GPT-3 - CoT (QCM→ALE , 2-shot) | 67.43 | No | Learn to Explain: Multimodal Reasoning via Thoug... | 2022-09-20 | Code |
| 6 | GPT-3 (QCM→A, 2-shot) | 67.28 | No | Learn to Explain: Multimodal Reasoning via Thoug... | 2022-09-20 | Code |
| 7 | UnifiedQA-BASE - CoT (QCM→ALE) | 66.53 | No | Learn to Explain: Multimodal Reasoning via Thoug... | 2022-09-20 | Code |
| 8 | GPT-3 - CoT(QCM→AE, 2-shot) | 66.09 | No | Learn to Explain: Multimodal Reasoning via Thoug... | 2022-09-20 | Code |