TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning ...

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman

2024-02-15MathMath Word Problem SolvingGSM8KArithmetic Reasoning
PaperPDFCode(official)

Abstract

Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy60.4OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy60.2OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy58.3OpenMath-Llama2-70B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy57.6OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy57.2OpenMath-Mistral-7B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy55.6OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Question AnsweringMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Question AnsweringMATHAccuracy50.7OpenMath-CodeLlama-70B (w/ code)
Question AnsweringMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code)
Question AnsweringMATHAccuracy48.3OpenMath-CodeLlama-34B (w/ code)
Question AnsweringMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code)
Question AnsweringMATHAccuracy46.3OpenMath-Llama2-70B (w/ code)
Question AnsweringMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code)
Question AnsweringMATHAccuracy45.5OpenMath-CodeLlama-13B (w/ code)
Question AnsweringMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code)
Question AnsweringMATHAccuracy44.5OpenMath-Mistral-7B (w/ code)
Question AnsweringMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code)
Question AnsweringMATHAccuracy43.6OpenMath-CodeLlama-7B (w/ code)
Question AnsweringMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code)
Question AnsweringMAWPSAccuracy (%)95.7OpenMath-CodeLlama-70B (w/ code)
Question AnsweringASDiv-AExecution Accuracy84.7OpenMath-CodeLlama-70B (w/ code)
Question AnsweringSVAMPExecution Accuracy87.8OpenMath-CodeLlama-70B (w/ code)
Math Word Problem SolvingMATHAccuracy60.4OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy60.2OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy58.3OpenMath-Llama2-70B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy57.6OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy57.2OpenMath-Mistral-7B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy55.6OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Math Word Problem SolvingMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Math Word Problem SolvingMATHAccuracy50.7OpenMath-CodeLlama-70B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code)
Math Word Problem SolvingMATHAccuracy48.3OpenMath-CodeLlama-34B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code)
Math Word Problem SolvingMATHAccuracy46.3OpenMath-Llama2-70B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code)
Math Word Problem SolvingMATHAccuracy45.5OpenMath-CodeLlama-13B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code)
Math Word Problem SolvingMATHAccuracy44.5OpenMath-Mistral-7B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code)
Math Word Problem SolvingMATHAccuracy43.6OpenMath-CodeLlama-7B (w/ code)
Math Word Problem SolvingMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code)
Math Word Problem SolvingMAWPSAccuracy (%)95.7OpenMath-CodeLlama-70B (w/ code)
Math Word Problem SolvingASDiv-AExecution Accuracy84.7OpenMath-CodeLlama-70B (w/ code)
Math Word Problem SolvingSVAMPExecution Accuracy87.8OpenMath-CodeLlama-70B (w/ code)
Mathematical Question AnsweringMATHAccuracy60.4OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy60.2OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy58.3OpenMath-Llama2-70B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy57.6OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy57.2OpenMath-Mistral-7B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy55.6OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Mathematical Question AnsweringMATHAccuracy50.7OpenMath-CodeLlama-70B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code)
Mathematical Question AnsweringMATHAccuracy48.3OpenMath-CodeLlama-34B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code)
Mathematical Question AnsweringMATHAccuracy46.3OpenMath-Llama2-70B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code)
Mathematical Question AnsweringMATHAccuracy45.5OpenMath-CodeLlama-13B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code)
Mathematical Question AnsweringMATHAccuracy44.5OpenMath-Mistral-7B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code)
Mathematical Question AnsweringMATHAccuracy43.6OpenMath-CodeLlama-7B (w/ code)
Mathematical Question AnsweringMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code)
Mathematical Question AnsweringMAWPSAccuracy (%)95.7OpenMath-CodeLlama-70B (w/ code)
Mathematical Question AnsweringASDiv-AExecution Accuracy84.7OpenMath-CodeLlama-70B (w/ code)
Mathematical Question AnsweringSVAMPExecution Accuracy87.8OpenMath-CodeLlama-70B (w/ code)
Mathematical ReasoningMATHAccuracy60.4OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy60.2OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy58.3OpenMath-Llama2-70B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy57.6OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy57.2OpenMath-Mistral-7B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy55.6OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Mathematical ReasoningMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Mathematical ReasoningMATHAccuracy50.7OpenMath-CodeLlama-70B (w/ code)
Mathematical ReasoningMATHParameters (Billions)70OpenMath-CodeLlama-70B (w/ code)
Mathematical ReasoningMATHAccuracy48.3OpenMath-CodeLlama-34B (w/ code)
Mathematical ReasoningMATHParameters (Billions)34OpenMath-CodeLlama-34B (w/ code)
Mathematical ReasoningMATHAccuracy46.3OpenMath-Llama2-70B (w/ code)
Mathematical ReasoningMATHParameters (Billions)70OpenMath-Llama2-70B (w/ code)
Mathematical ReasoningMATHAccuracy45.5OpenMath-CodeLlama-13B (w/ code)
Mathematical ReasoningMATHParameters (Billions)13OpenMath-CodeLlama-13B (w/ code)
Mathematical ReasoningMATHAccuracy44.5OpenMath-Mistral-7B (w/ code)
Mathematical ReasoningMATHParameters (Billions)7OpenMath-Mistral-7B (w/ code)
Mathematical ReasoningMATHAccuracy43.6OpenMath-CodeLlama-7B (w/ code)
Mathematical ReasoningMATHParameters (Billions)7OpenMath-CodeLlama-7B (w/ code)
Mathematical ReasoningMAWPSAccuracy (%)95.7OpenMath-CodeLlama-70B (w/ code)
Mathematical ReasoningASDiv-AExecution Accuracy84.7OpenMath-CodeLlama-70B (w/ code)
Mathematical ReasoningSVAMPExecution Accuracy87.8OpenMath-CodeLlama-70B (w/ code)
Arithmetic ReasoningGSM8KAccuracy90.8OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)70OpenMath-CodeLlama-70B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KAccuracy90.1OpenMath-Llama2-70B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)70OpenMath-Llama2-70B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KAccuracy88OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)34OpenMath-CodeLlama-34B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KAccuracy86.9OpenMath-Mistral-7B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)7OpenMath-Mistral-7B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KAccuracy86.8OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)13OpenMath-CodeLlama-13B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KAccuracy84.8OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KParameters (Billion)7OpenMath-CodeLlama-7B (w/ code, SC, k=50)
Arithmetic ReasoningGSM8KAccuracy84.7OpenMath-Llama2-70B (w/ code)
Arithmetic ReasoningGSM8KParameters (Billion)70OpenMath-Llama2-70B (w/ code)
Arithmetic ReasoningGSM8KAccuracy84.6OpenMath-CodeLlama-70B (w/ code)
Arithmetic ReasoningGSM8KParameters (Billion)70OpenMath-CodeLlama-70B (w/ code)
Arithmetic ReasoningGSM8KAccuracy80.7OpenMath-CodeLlama-34B (w/ code)
Arithmetic ReasoningGSM8KParameters (Billion)34OpenMath-CodeLlama-34B (w/ code)
Arithmetic ReasoningGSM8KAccuracy80.2OpenMath-Mistral-7B (w/ code)
Arithmetic ReasoningGSM8KParameters (Billion)7OpenMath-Mistral-7B (w/ code)
Arithmetic ReasoningGSM8KAccuracy78.8OpenMath-CodeLlama-13B (w/ code)
Arithmetic ReasoningGSM8KParameters (Billion)13OpenMath-CodeLlama-13B (w/ code)
Arithmetic ReasoningGSM8KAccuracy75.9OpenMath-CodeLlama-7B (w/ code)
Arithmetic ReasoningGSM8KParameters (Billion)7OpenMath-CodeLlama-7B (w/ code)

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems2025-07-17Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression2025-07-16Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15