CharXiv

Introduced 2024-06-26

CharXiv is a comprehensive evaluation suite for testing the chart understanding capabilities of Multimodal Large Language Models (MLLMs)¹². It was proposed to address the limitations of existing datasets that often focus on oversimplified and homogeneous charts with template-based questions¹².

Here are some key features of CharXiv:

  • It includes 2,323 natural, challenging, and diverse charts from arXiv papers¹².
  • CharXiv includes two types of questions¹²:
    1. Descriptive questions about examining basic chart elements.
    2. Reasoning questions that require synthesizing information across complex visual elements in the chart.
  • All charts and questions are handpicked, curated, and verified by human experts¹².

The results from CharXiv reveal a substantial gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%². All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs².

(1) [2406.18521] CharXiv: Charting Gaps in Realistic Chart Understanding in .... https://arxiv.org/abs/2406.18521. (2) CharXiv. https://charxiv.github.io/. (3) ChinaXiv.org 中国科学院科技论文预发布平台. https://chinaxiv.org/home.htm. (4) undefined. https://doi.org/10.48550/arXiv.2406.18521.