Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Reading Comprehension | RACE | Accuracy (High) | 51.6 | LLaMA 65B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (Middle) | 67.9 | LLaMA 65B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (High) | 48.3 | LLaMA 33B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (Middle) | 64.1 | LLaMA 33B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (High) | 47.2 | LLaMA 13B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (Middle) | 61.6 | LLaMA 13B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (High) | 46.9 | LLaMA 7B (zero-shot) |
| Reading Comprehension | RACE | Accuracy (Middle) | 61.1 | LLaMA 7B (zero-shot) |
| Few-Shot Learning | MedConceptsQA | Accuracy | 25.653 | meta-llama/Meta-Llama-3-8B-Instruct |
| Zero-Shot Learning | MedConceptsQA | Accuracy | 25.84 | meta-llama/Meta-Llama-3-8B-Instruct |
| Transfer Learning | MML | Average (%) | 68.9 | LLaMA 65B (fine-tuned) |
| Transfer Learning | MML | Average (%) | 63.4 | LLaMA 65B (5-shot) |
| Transfer Learning | MML | Average (%) | 57.8 | LLaMA 33B (5-shot) |
| Question Answering | SIQA | Accuracy | 52.3 | LLaMA 65B (zero-shot) |
| Question Answering | SIQA | Accuracy | 50.4 | LLaMA 13B (zero-shot) |
| Question Answering | SIQA | Accuracy | 50.4 | LLaMA 33B (zero-shot) |
| Question Answering | SIQA | Accuracy | 48.9 | LLaMA 7B (zero-shot) |
| Question Answering | Natural Questions | EM | 39.9 | LLaMA 65B (few-shot, k=64) |
| Question Answering | Natural Questions | EM | 35 | LLaMA 65B (few-shot, k=5) |
| Question Answering | Natural Questions | EM | 31 | LLaMA 65B (one-shot) |
| Question Answering | Natural Questions | EM | 24.9 | LLaMA 33B (zero-shot) |
| Question Answering | OBQA | Accuracy | 60.2 | LLaMA 65B (zero-shot) |
| Question Answering | OBQA | Accuracy | 58.6 | LLaMA 33B (zero-shot) |
| Question Answering | OBQA | Accuracy | 57.2 | LLaMA 7B (zero-shot) |
| Question Answering | OBQA | Accuracy | 56.4 | LLaMA 13B (zero-shot) |
| Question Answering | TruthfulQA | % info | 53 | LLaMA 65B |
| Question Answering | TruthfulQA | % true | 57 | LLaMA 65B |
| Question Answering | TruthfulQA | % info | 48 | LLaMA 33B |
| Question Answering | TruthfulQA | % true | 52 | LLaMA 33B |
| Question Answering | TruthfulQA | % info | 41 | LLaMA 13B |
| Question Answering | TruthfulQA | % true | 47 | LLaMA 13B |
| Question Answering | TruthfulQA | % info | 29 | LLaMA 7B |
| Question Answering | TruthfulQA | % true | 33 | LLaMA 7B |
| Question Answering | PIQA | Accuracy | 82.8 | LLaMA 65B (0-shot) |
| Question Answering | PIQA | Accuracy | 82.3 | LLaMA 33B (0-shot) |
| Question Answering | PIQA | Accuracy | 80.1 | LLaMA 13B (0-shot) |
| Question Answering | PIQA | Accuracy | 79.8 | LLaMA 7B (0-shot) |
| Question Answering | TimeQuestions | P@1 | 17.8 | Llama3 |
| Question Answering | BoolQ | Accuracy | 85.3 | LLaMA 65B (0-shot) |
| Question Answering | BoolQ | Accuracy | 83.1 | LLaMA 33B (0-shot) |
| Question Answering | BoolQ | Accuracy | 78.1 | LLaMA 13B (zero-shot) |
| Question Answering | BoolQ | Accuracy | 76.5 | LLaMA 7B (zero-shot) |
| Question Answering | TriviaQA | EM | 73 | LLaMA 65B (few-shot, k=64) |
| Question Answering | TriviaQA | EM | 72.6 | LLaMA 65B (few-shot, k=5) |
| Question Answering | TriviaQA | EM | 71.6 | LLaMA 65B (one-shot) |
| Question Answering | TriviaQA | EM | 68.2 | LLaMA 65B (zero-shot) |
| Question Answering | MATH | Accuracy | 20.5 | LLaMA 65B (maj1@k) |
| Question Answering | MATH | Parameters (Billions) | 65 | LLaMA 65B (maj1@k) |
| Question Answering | MATH | Accuracy | 15.2 | LLaMA 33B-maj1@k |
| Question Answering | MATH | Parameters (Billions) | 33 | LLaMA 33B-maj1@k |
| Question Answering | MATH | Accuracy | 10.6 | LLaMA 65B |
| Question Answering | MATH | Parameters (Billions) | 65 | LLaMA 65B |
| Question Answering | MATH | Accuracy | 8.8 | LLaMA 13B-maj1@k |
| Question Answering | MATH | Parameters (Billions) | 13 | LLaMA 13B-maj1@k |
| Question Answering | MATH | Accuracy | 7.1 | LLaMA 33B |
| Question Answering | MATH | Parameters (Billions) | 33 | LLaMA 33B |
| Question Answering | MATH | Accuracy | 6.9 | LLaMA 7B-maj1@k |
| Question Answering | MATH | Parameters (Billions) | 7 | LLaMA 7B-maj1@k |
| Question Answering | MATH | Accuracy | 3.9 | LLaMA 13B |
| Question Answering | MATH | Parameters (Billions) | 13 | LLaMA 13B |
| Question Answering | MATH | Accuracy | 2.9 | LLaMA 7B |
| Question Answering | MATH | Parameters (Billions) | 7 | LLaMA 7B |
| Code Generation | MBPP | Accuracy | 37.7 | LLaMA 65B (0-shot) |
| Code Generation | MBPP | Accuracy | 30.2 | LLaMA 33B (0-shot) |
| Code Generation | MBPP | Accuracy | 22 | LLaMA 13B (0-shot) |
| Code Generation | MBPP | Accuracy | 17.7 | LLaMA 7B (0-shot) |
| Common Sense Reasoning | WinoGrande | Accuracy | 77 | LLaMA 65B (0-shot) |
| Common Sense Reasoning | WinoGrande | Accuracy | 76 | LLaMA 33B (0-shot) |
| Common Sense Reasoning | WinoGrande | Accuracy | 73 | LLaMA 13B (0-shot) |
| Common Sense Reasoning | WinoGrande | Accuracy | 70.1 | LLaMA 7B (0-shot) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 57.8 | LLaMA 33B (zero-shot) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 56 | LLaMA 65B (zero-shot) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 52.7 | LLaMA 13B (zero-shot) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 47.6 | LLaMA 7B (zero-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 80 | LLaMA 33B (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 78.9 | LLaMA 65B (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 74.8 | LLaMA 13B (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 72.8 | LLaMA 7B (0-shot) |
| Math Word Problem Solving | MATH | Accuracy | 20.5 | LLaMA 65B (maj1@k) |
| Math Word Problem Solving | MATH | Parameters (Billions) | 65 | LLaMA 65B (maj1@k) |
| Math Word Problem Solving | MATH | Accuracy | 15.2 | LLaMA 33B-maj1@k |
| Math Word Problem Solving | MATH | Parameters (Billions) | 33 | LLaMA 33B-maj1@k |
| Math Word Problem Solving | MATH | Accuracy | 10.6 | LLaMA 65B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 65 | LLaMA 65B |
| Math Word Problem Solving | MATH | Accuracy | 8.8 | LLaMA 13B-maj1@k |
| Math Word Problem Solving | MATH | Parameters (Billions) | 13 | LLaMA 13B-maj1@k |
| Math Word Problem Solving | MATH | Accuracy | 7.1 | LLaMA 33B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 33 | LLaMA 33B |
| Math Word Problem Solving | MATH | Accuracy | 6.9 | LLaMA 7B-maj1@k |
| Math Word Problem Solving | MATH | Parameters (Billions) | 7 | LLaMA 7B-maj1@k |
| Math Word Problem Solving | MATH | Accuracy | 3.9 | LLaMA 13B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 13 | LLaMA 13B |
| Math Word Problem Solving | MATH | Accuracy | 2.9 | LLaMA 7B |
| Math Word Problem Solving | MATH | Parameters (Billions) | 7 | LLaMA 7B |
| Meta-Learning | MedConceptsQA | Accuracy | 25.653 | meta-llama/Meta-Llama-3-8B-Instruct |
| Mathematical Question Answering | MATH | Accuracy | 20.5 | LLaMA 65B (maj1@k) |
| Mathematical Question Answering | MATH | Parameters (Billions) | 65 | LLaMA 65B (maj1@k) |
| Mathematical Question Answering | MATH | Accuracy | 15.2 | LLaMA 33B-maj1@k |
| Mathematical Question Answering | MATH | Parameters (Billions) | 33 | LLaMA 33B-maj1@k |
| Mathematical Question Answering | MATH | Accuracy | 10.6 | LLaMA 65B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 65 | LLaMA 65B |
| Mathematical Question Answering | MATH | Accuracy | 8.8 | LLaMA 13B-maj1@k |
| Mathematical Question Answering | MATH | Parameters (Billions) | 13 | LLaMA 13B-maj1@k |
| Mathematical Question Answering | MATH | Accuracy | 7.1 | LLaMA 33B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 33 | LLaMA 33B |
| Mathematical Question Answering | MATH | Accuracy | 6.9 | LLaMA 7B-maj1@k |
| Mathematical Question Answering | MATH | Parameters (Billions) | 7 | LLaMA 7B-maj1@k |
| Mathematical Question Answering | MATH | Accuracy | 3.9 | LLaMA 13B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 13 | LLaMA 13B |
| Mathematical Question Answering | MATH | Accuracy | 2.9 | LLaMA 7B |
| Mathematical Question Answering | MATH | Parameters (Billions) | 7 | LLaMA 7B |
| Multi-Task Learning | MML | Average (%) | 68.9 | LLaMA 65B (fine-tuned) |
| Multi-Task Learning | MML | Average (%) | 63.4 | LLaMA 65B (5-shot) |
| Multi-Task Learning | MML | Average (%) | 57.8 | LLaMA 33B (5-shot) |
| Mathematical Reasoning | MATH | Accuracy | 20.5 | LLaMA 65B (maj1@k) |
| Mathematical Reasoning | MATH | Parameters (Billions) | 65 | LLaMA 65B (maj1@k) |
| Mathematical Reasoning | MATH | Accuracy | 15.2 | LLaMA 33B-maj1@k |
| Mathematical Reasoning | MATH | Parameters (Billions) | 33 | LLaMA 33B-maj1@k |
| Mathematical Reasoning | MATH | Accuracy | 10.6 | LLaMA 65B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 65 | LLaMA 65B |
| Mathematical Reasoning | MATH | Accuracy | 8.8 | LLaMA 13B-maj1@k |
| Mathematical Reasoning | MATH | Parameters (Billions) | 13 | LLaMA 13B-maj1@k |
| Mathematical Reasoning | MATH | Accuracy | 7.1 | LLaMA 33B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 33 | LLaMA 33B |
| Mathematical Reasoning | MATH | Accuracy | 6.9 | LLaMA 7B-maj1@k |
| Mathematical Reasoning | MATH | Parameters (Billions) | 7 | LLaMA 7B-maj1@k |
| Mathematical Reasoning | MATH | Accuracy | 3.9 | LLaMA 13B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 13 | LLaMA 13B |
| Mathematical Reasoning | MATH | Accuracy | 2.9 | LLaMA 7B |
| Mathematical Reasoning | MATH | Parameters (Billions) | 7 | LLaMA 7B |
| Sentence Completion | HellaSwag | Accuracy | 84.2 | LLaMA 65B (0-shot) |
| Sentence Completion | HellaSwag | Accuracy | 82.8 | LLaMA 33B (0-shot) |
| Sentence Completion | HellaSwag | Accuracy | 79.2 | LLaMA 13B (0-shot) |
| Sentence Completion | HellaSwag | Accuracy | 76.1 | LLaMA 7B (0-shot) |
| Stereotypical Bias Analysis | CrowS-Pairs | Age | 70.1 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Disability | 66.7 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Gender | 70.6 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Nationality | 64.2 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Overall | 66.6 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Physical Appearance | 77.8 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Race/Color | 57 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Religion | 70.6 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Sexual Orientation | 81 | LLaMA 65B |
| Stereotypical Bias Analysis | CrowS-Pairs | Socioeconomic status | 71.5 | LLaMA 65B |
| Arithmetic Reasoning | GSM8K | Accuracy | 69.7 | LLaMA 65B-maj1@k |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 65 | LLaMA 65B-maj1@k |
| Arithmetic Reasoning | GSM8K | Accuracy | 53.1 | LLaMA 33B-maj1@k |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 33 | LLaMA 33B-maj1@k |
| Arithmetic Reasoning | GSM8K | Accuracy | 50.9 | LLaMA 65B |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 65 | LLaMA 65B |
| Arithmetic Reasoning | GSM8K | Accuracy | 35.6 | LLaMA 33B |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 33 | LLaMA 33B |
| Arithmetic Reasoning | GSM8K | Accuracy | 29.3 | LLaMA 13B-maj1@k |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 13 | LLaMA 13B-maj1@k |
| Arithmetic Reasoning | GSM8K | Accuracy | 18.1 | LLaMA 7B (maj1@k) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 7 | LLaMA 7B (maj1@k) |
| Arithmetic Reasoning | GSM8K | Accuracy | 17.8 | LLaMA 13B |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 13 | LLaMA 13B |
| Arithmetic Reasoning | GSM8K | Accuracy | 11 | LLaMA 7B |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 7 | LLaMA 7B |