TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/StructChart: On the Schema, Metric, and Augmentation for V...

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

Renqiu Xia, Haoyang Peng, Hancheng Ye, Mingsheng Li, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan, Bo Zhang

2023-09-20Question AnsweringChart Question AnsweringChart UnderstandingLarge Language ModelLanguage Modelling
PaperPDFCodeCode(official)Code(official)

Abstract

Charts are common in literature across various scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data, e.g. in a tabular form. In this paper, we introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach to chart perception and reasoning tasks, which is generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works. Specifically, StructChart first reformulates the chart data from the tubular form (linearized CSV) to STR, which can friendlily reduce the task gap between chart perception and reasoning. We then propose a Structuring Chart-oriented Representation Metric (SCRM) to quantitatively evaluate the chart perception task performance. To augment the training, we further explore the potential of Large Language Models (LLMs) to enhance the diversity in both chart visual style and statistical information. Extensive experiments on various chart-related tasks demonstrate the effectiveness and potential of a unified chart perception-reasoning paradigm to push the frontier of chart understanding.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)ChartQA1:1 Accuracy65.3StructChart+GPT3.5 (STR ChartQA+SimChart9K)
Visual Question Answering (VQA)ChartQA1:1 Accuracy60.7StructChart+GPT3.5 (STR)
Chart Question AnsweringChartQA1:1 Accuracy65.3StructChart+GPT3.5 (STR ChartQA+SimChart9K)
Chart Question AnsweringChartQA1:1 Accuracy60.7StructChart+GPT3.5 (STR)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17