CharXiv
CharXiv is a comprehensive evaluation suite for testing the chart understanding capabilities of Multimodal Large Language Models (MLLMs)¹². It was proposed to address the limitations of existing datasets that often focus on oversimplified and homogeneous charts with template-based questions¹².
Here are some key features of CharXiv:
- It includes 2,323 natural, challenging, and diverse charts from arXiv papers¹².
- CharXiv includes two types of questions¹²:
- Descriptive questions about examining basic chart elements.
- Reasoning questions that require synthesizing information across complex visual elements in the chart.
- All charts and questions are handpicked, curated, and verified by human experts¹².
The results from CharXiv reveal a substantial gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%². All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs².
(1) [2406.18521] CharXiv: Charting Gaps in Realistic Chart Understanding in .... https://arxiv.org/abs/2406.18521. (2) CharXiv. https://charxiv.github.io/. (3) ChinaXiv.org 中国科学院科技论文预发布平台. https://chinaxiv.org/home.htm. (4) undefined. https://doi.org/10.48550/arXiv.2406.18521.