TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Llemma: An Open Language Model For Mathematics

Llemma: An Open Language Model For Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Mcaleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

2023-10-16MathAutomated Theorem ProvingLarge Language ModelArithmetic ReasoningLanguage Modelling
PaperPDFCode(official)CodeCode(official)Code(official)

Abstract

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

Results

TaskDatasetMetricValueModel
Automated Theorem ProvingminiF2F-testPass@3226.2LLEMMA-7b
Automated Theorem ProvingminiF2F-testcumulative26.2LLEMMA-7b
Automated Theorem ProvingminiF2F-testPass@3225.8LLEMMA-34b
Automated Theorem ProvingminiF2F-testcumulative25.8LLEMMA-34b
Mathematical ProofsminiF2F-testPass@3226.2LLEMMA-7b
Mathematical ProofsminiF2F-testcumulative26.2LLEMMA-7b
Mathematical ProofsminiF2F-testPass@3225.8LLEMMA-34b
Mathematical ProofsminiF2F-testcumulative25.8LLEMMA-34b
Arithmetic ReasoningGSM8KAccuracy51.5Llemma 34B
Arithmetic ReasoningGSM8KParameters (Billion)34Llemma 34B
Arithmetic ReasoningGSM8KAccuracy36.4Llemma 7B
Arithmetic ReasoningGSM8KParameters (Billion)7Llemma 7B

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17