Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra
Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Question Answering | MATH | Accuracy | 64.9 | Minerva 62B (maj5@256) |
| Question Answering | MATH | Parameters (Billions) | 62 | Minerva 62B (maj5@256) |
| Question Answering | MATH | Accuracy | 50.3 | Minerva 540B (maj1@k, k=64) |
| Question Answering | MATH | Accuracy | 47.6 | Minerva 8B (maj5@256) |
| Question Answering | MATH | Parameters (Billions) | 8 | Minerva 8B (maj5@256) |
| Question Answering | MATH | Accuracy | 43.4 | Minerva 62B (maj1@k, k=64) |
| Question Answering | MATH | Parameters (Billions) | 62 | Minerva 62B (maj1@k, k=64) |
| Question Answering | MATH | Accuracy | 33.6 | Minerva 540B |
| Question Answering | MATH | Parameters (Billions) | 540 | Minerva 540B |
| Question Answering | MATH | Accuracy | 27.6 | Minerva 62B (4-shot) |
| Question Answering | MATH | Parameters (Billions) | 62 | Minerva 62B (4-shot) |
| Question Answering | MATH | Accuracy | 25.4 | Minerva 8B (maj1@k, k=64) |
| Question Answering | MATH | Parameters (Billions) | 8 | Minerva 8B (maj1@k, k=64) |
| Question Answering | MATH | Accuracy | 19.1 | davinci-002 175B |
| Question Answering | MATH | Parameters (Billions) | 175 | davinci-002 175B |
| Question Answering | MATH | Accuracy | 14.1 | Minerva 8B |
| Question Answering | MATH | Parameters (Billions) | 8 | Minerva 8B |
| Question Answering | MATH | Accuracy | 8.8 | PaLM 540B |
| Question Answering | MATH | Parameters (Billions) | 540 | PaLM 540B |
| Question Answering | MATH | Accuracy | 5.6 | PaLM 8B (fine-tuned) |
| Question Answering | MATH | Parameters (Billions) | 8 | PaLM 8B (fine-tuned) |
| Question Answering | MATH | Accuracy | 4.4 | PaLM 62B |
| Question Answering | MATH | Parameters (Billions) | 62 | PaLM 62B |
| Question Answering | MATH | Accuracy | 1.5 | PaLM 8B |
| Question Answering | MATH | Parameters (Billions) | 8 | PaLM 8B |
| Math Word Problem Solving | MATH | Accuracy | 64.9 | Minerva 62B (maj5@256) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 62 | Minerva 62B (maj5@256) |
| Math Word Problem Solving | MATH | Accuracy | 50.3 | Minerva 540B (maj1@k, k=64) |
| Math Word Problem Solving | MATH | Accuracy | 47.6 | Minerva 8B (maj5@256) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 8 | Minerva 8B (maj5@256) |
| Math Word Problem Solving | MATH | Accuracy | 43.4 | Minerva 62B (maj1@k, k=64) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 62 | Minerva 62B (maj1@k, k=64) |
| Math Word Problem Solving | MATH | Accuracy | 33.6 | Minerva 540B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 540 | Minerva 540B |
| Math Word Problem Solving | MATH | Accuracy | 27.6 | Minerva 62B (4-shot) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 62 | Minerva 62B (4-shot) |
| Math Word Problem Solving | MATH | Accuracy | 25.4 | Minerva 8B (maj1@k, k=64) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 8 | Minerva 8B (maj1@k, k=64) |
| Math Word Problem Solving | MATH | Accuracy | 19.1 | davinci-002 175B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 175 | davinci-002 175B |
| Math Word Problem Solving | MATH | Accuracy | 14.1 | Minerva 8B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 8 | Minerva 8B |
| Math Word Problem Solving | MATH | Accuracy | 8.8 | PaLM 540B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 540 | PaLM 540B |
| Math Word Problem Solving | MATH | Accuracy | 5.6 | PaLM 8B (fine-tuned) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 8 | PaLM 8B (fine-tuned) |
| Math Word Problem Solving | MATH | Accuracy | 4.4 | PaLM 62B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 62 | PaLM 62B |
| Math Word Problem Solving | MATH | Accuracy | 1.5 | PaLM 8B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 8 | PaLM 8B |
| Mathematical Question Answering | MATH | Accuracy | 64.9 | Minerva 62B (maj5@256) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 62 | Minerva 62B (maj5@256) |
| Mathematical Question Answering | MATH | Accuracy | 50.3 | Minerva 540B (maj1@k, k=64) |
| Mathematical Question Answering | MATH | Accuracy | 47.6 | Minerva 8B (maj5@256) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 8 | Minerva 8B (maj5@256) |
| Mathematical Question Answering | MATH | Accuracy | 43.4 | Minerva 62B (maj1@k, k=64) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 62 | Minerva 62B (maj1@k, k=64) |
| Mathematical Question Answering | MATH | Accuracy | 33.6 | Minerva 540B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 540 | Minerva 540B |
| Mathematical Question Answering | MATH | Accuracy | 27.6 | Minerva 62B (4-shot) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 62 | Minerva 62B (4-shot) |
| Mathematical Question Answering | MATH | Accuracy | 25.4 | Minerva 8B (maj1@k, k=64) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 8 | Minerva 8B (maj1@k, k=64) |
| Mathematical Question Answering | MATH | Accuracy | 19.1 | davinci-002 175B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 175 | davinci-002 175B |
| Mathematical Question Answering | MATH | Accuracy | 14.1 | Minerva 8B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 8 | Minerva 8B |
| Mathematical Question Answering | MATH | Accuracy | 8.8 | PaLM 540B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 540 | PaLM 540B |
| Mathematical Question Answering | MATH | Accuracy | 5.6 | PaLM 8B (fine-tuned) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 8 | PaLM 8B (fine-tuned) |
| Mathematical Question Answering | MATH | Accuracy | 4.4 | PaLM 62B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 62 | PaLM 62B |
| Mathematical Question Answering | MATH | Accuracy | 1.5 | PaLM 8B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 8 | PaLM 8B |
| Mathematical Reasoning | MATH | Accuracy | 64.9 | Minerva 62B (maj5@256) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 62 | Minerva 62B (maj5@256) |
| Mathematical Reasoning | MATH | Accuracy | 50.3 | Minerva 540B (maj1@k, k=64) |
| Mathematical Reasoning | MATH | Accuracy | 47.6 | Minerva 8B (maj5@256) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 8 | Minerva 8B (maj5@256) |
| Mathematical Reasoning | MATH | Accuracy | 43.4 | Minerva 62B (maj1@k, k=64) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 62 | Minerva 62B (maj1@k, k=64) |
| Mathematical Reasoning | MATH | Accuracy | 33.6 | Minerva 540B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 540 | Minerva 540B |
| Mathematical Reasoning | MATH | Accuracy | 27.6 | Minerva 62B (4-shot) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 62 | Minerva 62B (4-shot) |
| Mathematical Reasoning | MATH | Accuracy | 25.4 | Minerva 8B (maj1@k, k=64) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 8 | Minerva 8B (maj1@k, k=64) |
| Mathematical Reasoning | MATH | Accuracy | 19.1 | davinci-002 175B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 175 | davinci-002 175B |
| Mathematical Reasoning | MATH | Accuracy | 14.1 | Minerva 8B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 8 | Minerva 8B |
| Mathematical Reasoning | MATH | Accuracy | 8.8 | PaLM 540B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 540 | PaLM 540B |
| Mathematical Reasoning | MATH | Accuracy | 5.6 | PaLM 8B (fine-tuned) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 8 | PaLM 8B (fine-tuned) |
| Mathematical Reasoning | MATH | Accuracy | 4.4 | PaLM 62B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 62 | PaLM 62B |
| Mathematical Reasoning | MATH | Accuracy | 1.5 | PaLM 8B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 8 | PaLM 8B |
| Arithmetic Reasoning | GSM8K | Accuracy | 89 | Minerva 62B (maj5@100) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 62 | Minerva 62B (maj5@100) |
| Arithmetic Reasoning | GSM8K | Accuracy | 78.5 | Minerva 540B (CoT) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 540 | Minerva 540B (CoT) |
| Arithmetic Reasoning | GSM8K | Accuracy | 68.5 | Minerva 62B (maj1@100) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 62 | Minerva 62B (maj1@100) |
| Arithmetic Reasoning | GSM8K | Accuracy | 56.8 | Minerva 8B (maj5@100) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 8 | Minerva 8B (maj5@100) |
| Arithmetic Reasoning | GSM8K | Accuracy | 56.5 | PaLM 540B (8-shot) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 540 | PaLM 540B (8-shot) |
| Arithmetic Reasoning | GSM8K | Accuracy | 52.4 | Minerva 62B (8-shot) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 62 | Minerva 62B (8-shot) |
| Arithmetic Reasoning | GSM8K | Accuracy | 33 | PaLM 62B (8-shot) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 62 | PaLM 62B (8-shot) |
| Arithmetic Reasoning | GSM8K | Accuracy | 28.4 | Minerva 8B-maj1@k (8-shot) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 8 | Minerva 8B-maj1@k (8-shot) |
| Arithmetic Reasoning | GSM8K | Accuracy | 16.2 | Minerva 8B (8-shot) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 8 | Minerva 8B (8-shot) |
| Arithmetic Reasoning | GSM8K | Accuracy | 4.1 | PaLM 8B (8-shot) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 8 | PaLM 8B (8-shot) |