TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Solving Quantitative Reasoning Problems with Language Models

Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

2022-06-29Math Word Problem SolvingMulti-task Language UnderstandingNatural Language UnderstandingLarge Language ModelArithmetic ReasoningLanguage Modelling
PaperPDFCode

Abstract

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy64.9Minerva 62B (maj5@256)
Question AnsweringMATHParameters (Billions)62Minerva 62B (maj5@256)
Question AnsweringMATHAccuracy50.3Minerva 540B (maj1@k, k=64)
Question AnsweringMATHAccuracy47.6Minerva 8B (maj5@256)
Question AnsweringMATHParameters (Billions)8Minerva 8B (maj5@256)
Question AnsweringMATHAccuracy43.4Minerva 62B (maj1@k, k=64)
Question AnsweringMATHParameters (Billions)62Minerva 62B (maj1@k, k=64)
Question AnsweringMATHAccuracy33.6Minerva 540B
Question AnsweringMATHParameters (Billions)540Minerva 540B
Question AnsweringMATHAccuracy27.6Minerva 62B (4-shot)
Question AnsweringMATHParameters (Billions)62Minerva 62B (4-shot)
Question AnsweringMATHAccuracy25.4Minerva 8B (maj1@k, k=64)
Question AnsweringMATHParameters (Billions)8Minerva 8B (maj1@k, k=64)
Question AnsweringMATHAccuracy19.1davinci-002 175B
Question AnsweringMATHParameters (Billions)175davinci-002 175B
Question AnsweringMATHAccuracy14.1Minerva 8B
Question AnsweringMATHParameters (Billions)8Minerva 8B
Question AnsweringMATHAccuracy8.8PaLM 540B
Question AnsweringMATHParameters (Billions)540PaLM 540B
Question AnsweringMATHAccuracy5.6PaLM 8B (fine-tuned)
Question AnsweringMATHParameters (Billions)8PaLM 8B (fine-tuned)
Question AnsweringMATHAccuracy4.4PaLM 62B
Question AnsweringMATHParameters (Billions)62PaLM 62B
Question AnsweringMATHAccuracy1.5PaLM 8B
Question AnsweringMATHParameters (Billions)8PaLM 8B
Math Word Problem SolvingMATHAccuracy64.9Minerva 62B (maj5@256)
Math Word Problem SolvingMATHParameters (Billions)62Minerva 62B (maj5@256)
Math Word Problem SolvingMATHAccuracy50.3Minerva 540B (maj1@k, k=64)
Math Word Problem SolvingMATHAccuracy47.6Minerva 8B (maj5@256)
Math Word Problem SolvingMATHParameters (Billions)8Minerva 8B (maj5@256)
Math Word Problem SolvingMATHAccuracy43.4Minerva 62B (maj1@k, k=64)
Math Word Problem SolvingMATHParameters (Billions)62Minerva 62B (maj1@k, k=64)
Math Word Problem SolvingMATHAccuracy33.6Minerva 540B
Math Word Problem SolvingMATHParameters (Billions)540Minerva 540B
Math Word Problem SolvingMATHAccuracy27.6Minerva 62B (4-shot)
Math Word Problem SolvingMATHParameters (Billions)62Minerva 62B (4-shot)
Math Word Problem SolvingMATHAccuracy25.4Minerva 8B (maj1@k, k=64)
Math Word Problem SolvingMATHParameters (Billions)8Minerva 8B (maj1@k, k=64)
Math Word Problem SolvingMATHAccuracy19.1davinci-002 175B
Math Word Problem SolvingMATHParameters (Billions)175davinci-002 175B
Math Word Problem SolvingMATHAccuracy14.1Minerva 8B
Math Word Problem SolvingMATHParameters (Billions)8Minerva 8B
Math Word Problem SolvingMATHAccuracy8.8PaLM 540B
Math Word Problem SolvingMATHParameters (Billions)540PaLM 540B
Math Word Problem SolvingMATHAccuracy5.6PaLM 8B (fine-tuned)
Math Word Problem SolvingMATHParameters (Billions)8PaLM 8B (fine-tuned)
Math Word Problem SolvingMATHAccuracy4.4PaLM 62B
Math Word Problem SolvingMATHParameters (Billions)62PaLM 62B
Math Word Problem SolvingMATHAccuracy1.5PaLM 8B
Math Word Problem SolvingMATHParameters (Billions)8PaLM 8B
Mathematical Question AnsweringMATHAccuracy64.9Minerva 62B (maj5@256)
Mathematical Question AnsweringMATHParameters (Billions)62Minerva 62B (maj5@256)
Mathematical Question AnsweringMATHAccuracy50.3Minerva 540B (maj1@k, k=64)
Mathematical Question AnsweringMATHAccuracy47.6Minerva 8B (maj5@256)
Mathematical Question AnsweringMATHParameters (Billions)8Minerva 8B (maj5@256)
Mathematical Question AnsweringMATHAccuracy43.4Minerva 62B (maj1@k, k=64)
Mathematical Question AnsweringMATHParameters (Billions)62Minerva 62B (maj1@k, k=64)
Mathematical Question AnsweringMATHAccuracy33.6Minerva 540B
Mathematical Question AnsweringMATHParameters (Billions)540Minerva 540B
Mathematical Question AnsweringMATHAccuracy27.6Minerva 62B (4-shot)
Mathematical Question AnsweringMATHParameters (Billions)62Minerva 62B (4-shot)
Mathematical Question AnsweringMATHAccuracy25.4Minerva 8B (maj1@k, k=64)
Mathematical Question AnsweringMATHParameters (Billions)8Minerva 8B (maj1@k, k=64)
Mathematical Question AnsweringMATHAccuracy19.1davinci-002 175B
Mathematical Question AnsweringMATHParameters (Billions)175davinci-002 175B
Mathematical Question AnsweringMATHAccuracy14.1Minerva 8B
Mathematical Question AnsweringMATHParameters (Billions)8Minerva 8B
Mathematical Question AnsweringMATHAccuracy8.8PaLM 540B
Mathematical Question AnsweringMATHParameters (Billions)540PaLM 540B
Mathematical Question AnsweringMATHAccuracy5.6PaLM 8B (fine-tuned)
Mathematical Question AnsweringMATHParameters (Billions)8PaLM 8B (fine-tuned)
Mathematical Question AnsweringMATHAccuracy4.4PaLM 62B
Mathematical Question AnsweringMATHParameters (Billions)62PaLM 62B
Mathematical Question AnsweringMATHAccuracy1.5PaLM 8B
Mathematical Question AnsweringMATHParameters (Billions)8PaLM 8B
Mathematical ReasoningMATHAccuracy64.9Minerva 62B (maj5@256)
Mathematical ReasoningMATHParameters (Billions)62Minerva 62B (maj5@256)
Mathematical ReasoningMATHAccuracy50.3Minerva 540B (maj1@k, k=64)
Mathematical ReasoningMATHAccuracy47.6Minerva 8B (maj5@256)
Mathematical ReasoningMATHParameters (Billions)8Minerva 8B (maj5@256)
Mathematical ReasoningMATHAccuracy43.4Minerva 62B (maj1@k, k=64)
Mathematical ReasoningMATHParameters (Billions)62Minerva 62B (maj1@k, k=64)
Mathematical ReasoningMATHAccuracy33.6Minerva 540B
Mathematical ReasoningMATHParameters (Billions)540Minerva 540B
Mathematical ReasoningMATHAccuracy27.6Minerva 62B (4-shot)
Mathematical ReasoningMATHParameters (Billions)62Minerva 62B (4-shot)
Mathematical ReasoningMATHAccuracy25.4Minerva 8B (maj1@k, k=64)
Mathematical ReasoningMATHParameters (Billions)8Minerva 8B (maj1@k, k=64)
Mathematical ReasoningMATHAccuracy19.1davinci-002 175B
Mathematical ReasoningMATHParameters (Billions)175davinci-002 175B
Mathematical ReasoningMATHAccuracy14.1Minerva 8B
Mathematical ReasoningMATHParameters (Billions)8Minerva 8B
Mathematical ReasoningMATHAccuracy8.8PaLM 540B
Mathematical ReasoningMATHParameters (Billions)540PaLM 540B
Mathematical ReasoningMATHAccuracy5.6PaLM 8B (fine-tuned)
Mathematical ReasoningMATHParameters (Billions)8PaLM 8B (fine-tuned)
Mathematical ReasoningMATHAccuracy4.4PaLM 62B
Mathematical ReasoningMATHParameters (Billions)62PaLM 62B
Mathematical ReasoningMATHAccuracy1.5PaLM 8B
Mathematical ReasoningMATHParameters (Billions)8PaLM 8B
Arithmetic ReasoningGSM8KAccuracy89Minerva 62B (maj5@100)
Arithmetic ReasoningGSM8KParameters (Billion)62Minerva 62B (maj5@100)
Arithmetic ReasoningGSM8KAccuracy78.5Minerva 540B (CoT)
Arithmetic ReasoningGSM8KParameters (Billion)540Minerva 540B (CoT)
Arithmetic ReasoningGSM8KAccuracy68.5Minerva 62B (maj1@100)
Arithmetic ReasoningGSM8KParameters (Billion)62Minerva 62B (maj1@100)
Arithmetic ReasoningGSM8KAccuracy56.8Minerva 8B (maj5@100)
Arithmetic ReasoningGSM8KParameters (Billion)8Minerva 8B (maj5@100)
Arithmetic ReasoningGSM8KAccuracy56.5PaLM 540B (8-shot)
Arithmetic ReasoningGSM8KParameters (Billion)540PaLM 540B (8-shot)
Arithmetic ReasoningGSM8KAccuracy52.4Minerva 62B (8-shot)
Arithmetic ReasoningGSM8KParameters (Billion)62Minerva 62B (8-shot)
Arithmetic ReasoningGSM8KAccuracy33PaLM 62B (8-shot)
Arithmetic ReasoningGSM8KParameters (Billion)62PaLM 62B (8-shot)
Arithmetic ReasoningGSM8KAccuracy28.4Minerva 8B-maj1@k (8-shot)
Arithmetic ReasoningGSM8KParameters (Billion)8Minerva 8B-maj1@k (8-shot)
Arithmetic ReasoningGSM8KAccuracy16.2Minerva 8B (8-shot)
Arithmetic ReasoningGSM8KParameters (Billion)8Minerva 8B (8-shot)
Arithmetic ReasoningGSM8KAccuracy4.1PaLM 8B (8-shot)
Arithmetic ReasoningGSM8KParameters (Billion)8PaLM 8B (8-shot)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17