Papers With Code 2 | ML Benchmarks, SotA Results & Code

CharXiv is a comprehensive evaluation suite for testing the chart understanding capabilities of Multimodal Large Language Models (MLLMs)¹². It was proposed to address the limitations of existing datasets that often focus on oversimplified and homogeneous charts with template-based questions¹².

Here are some key features of CharXiv:

It includes 2,323 natural, challenging, and diverse charts from arXiv papers¹².
CharXiv includes two types of questions¹²:
1. Descriptive questions about examining basic chart elements.
2. Reasoning questions that require synthesizing information across complex visual elements in the chart.
All charts and questions are handpicked, curated, and verified by human experts¹².

The results from CharXiv reveal a substantial gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%². All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs².

(1) [2406.18521] CharXiv: Charting Gaps in Realistic Chart Understanding in .... https://arxiv.org/abs/2406.18521. (2) CharXiv. https://charxiv.github.io/. (3) ChinaXiv.org 中国科学院科技论文预发布平台. https://chinaxiv.org/home.htm. (4) undefined. https://doi.org/10.48550/arXiv.2406.18521.