TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VNHSGE: VietNamese High School Graduation Examination Data...

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

Dao Xuan-Quy, Le Ngoc-Bich, Vo The-Duy, Phan Xuan-Dung, Ngo Bac-Bien, Nguyen Van-Tien, Nguyen Thi-My-Thanh, Nguyen Hong-Phuoc

2023-05-20Reading ComprehensionQuestion AnsweringText GenerationQuestion RewritingMultiple-choiceVisual Question Answering
PaperPDFCode(official)

Abstract

The VNHSGE (VietNamese High School Graduation Examination) dataset, developed exclusively for evaluating large language models (LLMs), is introduced in this article. The dataset, which covers nine subjects, was generated from the Vietnamese National High School Graduation Examination and comparable tests. 300 literary essays have been included, and there are over 19,000 multiple-choice questions on a range of topics. The dataset assesses LLMs in multitasking situations such as question answering, text generation, reading comprehension, visual question answering, and more by including both textual data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on the VNHSGE dataset and contrasted their performance with that of Vietnamese students to see how well they performed. The results show that ChatGPT and BingChat both perform at a human level in a number of areas, including literature, English, history, geography, and civics education. They still have space to grow, though, especially in the areas of mathematics, physics, chemistry, and biology. The VNHSGE dataset seeks to provide an adequate benchmark for assessing the abilities of LLMs with its wide-ranging coverage and variety of activities. We intend to promote future developments in the creation of LLMs by making this dataset available to the scientific community, especially in resolving LLMs' limits in disciplines involving mathematics and the natural sciences.

Results

TaskDatasetMetricValueModel
Question AnsweringVNHSGE-EnglishAccuracy92.4Bing Chat
Question AnsweringVNHSGE-EnglishAccuracy79.2ChatGPT
Question AnsweringVNHSGE-HistoryAccuracy88.5Bing Chat
Question AnsweringVNHSGE-HistoryAccuracy56.5ChatGPT
Question AnsweringVNHSGE-BiologyAccuracy69Bing Chat
Question AnsweringVNHSGE-BiologyAccuracy58ChatGPT
Question AnsweringVNHSGE MathematicsAccuracy60Bing Chat
Question AnsweringVNHSGE MathematicsAccuracy58.8ChatGPT
Question AnsweringVNHSGE-CivicAccuracy85.5Bing Chat
Question AnsweringVNHSGE-CivicAccuracy70.5ChatGPT
Question AnsweringVNHSGE-LiteratureAccuracy68ChatGPT
Question AnsweringVNHSGE-LiteratureAccuracy56.8Bing Chat
Question AnsweringVNHSGE-PhysicsAccuracy66Bing Chat
Question AnsweringVNHSGE-PhysicsAccuracy61ChatGPT
Question AnsweringVNHSGE-GeographyAccuracy85.5Bing Chat
Question AnsweringVNHSGE-GeographyAccuracy61.5ChatGPT
Question AnsweringVNHSGE-ChemistryAccuracy52.5Bing Chat
Question AnsweringVNHSGE-ChemistryAccuracy48ChatGPT

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16