TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

2023-02-27arXiv 2023 2Question AnsweringFew-Shot LearningMath Word Problem SolvingMulti-task Language UnderstandingSentence CompletionStereotypical Bias AnalysisCommon Sense ReasoningArithmetic ReasoningCode GenerationZero-Shot Learning
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

Results

TaskDatasetMetricValueModel
Reading ComprehensionRACEAccuracy (High)51.6LLaMA 65B (zero-shot)
Reading ComprehensionRACEAccuracy (Middle)67.9LLaMA 65B (zero-shot)
Reading ComprehensionRACEAccuracy (High)48.3LLaMA 33B (zero-shot)
Reading ComprehensionRACEAccuracy (Middle)64.1LLaMA 33B (zero-shot)
Reading ComprehensionRACEAccuracy (High)47.2LLaMA 13B (zero-shot)
Reading ComprehensionRACEAccuracy (Middle)61.6LLaMA 13B (zero-shot)
Reading ComprehensionRACEAccuracy (High)46.9LLaMA 7B (zero-shot)
Reading ComprehensionRACEAccuracy (Middle)61.1LLaMA 7B (zero-shot)
Few-Shot LearningMedConceptsQAAccuracy25.653meta-llama/Meta-Llama-3-8B-Instruct
Zero-Shot LearningMedConceptsQAAccuracy25.84meta-llama/Meta-Llama-3-8B-Instruct
Transfer LearningMMLAverage (%)68.9LLaMA 65B (fine-tuned)
Transfer LearningMMLAverage (%)63.4LLaMA 65B (5-shot)
Transfer LearningMMLAverage (%)57.8LLaMA 33B (5-shot)
Question AnsweringSIQAAccuracy52.3LLaMA 65B (zero-shot)
Question AnsweringSIQAAccuracy50.4LLaMA 13B (zero-shot)
Question AnsweringSIQAAccuracy50.4LLaMA 33B (zero-shot)
Question AnsweringSIQAAccuracy48.9LLaMA 7B (zero-shot)
Question AnsweringNatural QuestionsEM39.9LLaMA 65B (few-shot, k=64)
Question AnsweringNatural QuestionsEM35LLaMA 65B (few-shot, k=5)
Question AnsweringNatural QuestionsEM31LLaMA 65B (one-shot)
Question AnsweringNatural QuestionsEM24.9LLaMA 33B (zero-shot)
Question AnsweringOBQAAccuracy60.2LLaMA 65B (zero-shot)
Question AnsweringOBQAAccuracy58.6LLaMA 33B (zero-shot)
Question AnsweringOBQAAccuracy57.2LLaMA 7B (zero-shot)
Question AnsweringOBQAAccuracy56.4LLaMA 13B (zero-shot)
Question AnsweringTruthfulQA% info53LLaMA 65B
Question AnsweringTruthfulQA% true57LLaMA 65B
Question AnsweringTruthfulQA% info48LLaMA 33B
Question AnsweringTruthfulQA% true52LLaMA 33B
Question AnsweringTruthfulQA% info41LLaMA 13B
Question AnsweringTruthfulQA% true47LLaMA 13B
Question AnsweringTruthfulQA% info29LLaMA 7B
Question AnsweringTruthfulQA% true33LLaMA 7B
Question AnsweringPIQAAccuracy82.8LLaMA 65B (0-shot)
Question AnsweringPIQAAccuracy82.3LLaMA 33B (0-shot)
Question AnsweringPIQAAccuracy80.1LLaMA 13B (0-shot)
Question AnsweringPIQAAccuracy79.8LLaMA 7B (0-shot)
Question AnsweringTimeQuestionsP@117.8Llama3
Question AnsweringBoolQAccuracy85.3LLaMA 65B (0-shot)
Question AnsweringBoolQAccuracy83.1LLaMA 33B (0-shot)
Question AnsweringBoolQAccuracy78.1LLaMA 13B (zero-shot)
Question AnsweringBoolQAccuracy76.5LLaMA 7B (zero-shot)
Question AnsweringTriviaQAEM73LLaMA 65B (few-shot, k=64)
Question AnsweringTriviaQAEM72.6LLaMA 65B (few-shot, k=5)
Question AnsweringTriviaQAEM71.6LLaMA 65B (one-shot)
Question AnsweringTriviaQAEM68.2LLaMA 65B (zero-shot)
Question AnsweringMATHAccuracy20.5LLaMA 65B (maj1@k)
Question AnsweringMATHParameters (Billions)65LLaMA 65B (maj1@k)
Question AnsweringMATHAccuracy15.2LLaMA 33B-maj1@k
Question AnsweringMATHParameters (Billions)33LLaMA 33B-maj1@k
Question AnsweringMATHAccuracy10.6LLaMA 65B
Question AnsweringMATHParameters (Billions)65LLaMA 65B
Question AnsweringMATHAccuracy8.8LLaMA 13B-maj1@k
Question AnsweringMATHParameters (Billions)13LLaMA 13B-maj1@k
Question AnsweringMATHAccuracy7.1LLaMA 33B
Question AnsweringMATHParameters (Billions)33LLaMA 33B
Question AnsweringMATHAccuracy6.9LLaMA 7B-maj1@k
Question AnsweringMATHParameters (Billions)7LLaMA 7B-maj1@k
Question AnsweringMATHAccuracy3.9LLaMA 13B
Question AnsweringMATHParameters (Billions)13LLaMA 13B
Question AnsweringMATHAccuracy2.9LLaMA 7B
Question AnsweringMATHParameters (Billions)7LLaMA 7B
Code GenerationMBPPAccuracy37.7LLaMA 65B (0-shot)
Code GenerationMBPPAccuracy30.2LLaMA 33B (0-shot)
Code GenerationMBPPAccuracy22LLaMA 13B (0-shot)
Code GenerationMBPPAccuracy17.7LLaMA 7B (0-shot)
Common Sense ReasoningWinoGrandeAccuracy77LLaMA 65B (0-shot)
Common Sense ReasoningWinoGrandeAccuracy76LLaMA 33B (0-shot)
Common Sense ReasoningWinoGrandeAccuracy73LLaMA 13B (0-shot)
Common Sense ReasoningWinoGrandeAccuracy70.1LLaMA 7B (0-shot)
Common Sense ReasoningARC (Challenge)Accuracy57.8LLaMA 33B (zero-shot)
Common Sense ReasoningARC (Challenge)Accuracy56LLaMA 65B (zero-shot)
Common Sense ReasoningARC (Challenge)Accuracy52.7LLaMA 13B (zero-shot)
Common Sense ReasoningARC (Challenge)Accuracy47.6LLaMA 7B (zero-shot)
Common Sense ReasoningARC (Easy)Accuracy80LLaMA 33B (0-shot)
Common Sense ReasoningARC (Easy)Accuracy78.9LLaMA 65B (0-shot)
Common Sense ReasoningARC (Easy)Accuracy74.8LLaMA 13B (0-shot)
Common Sense ReasoningARC (Easy)Accuracy72.8LLaMA 7B (0-shot)
Math Word Problem SolvingMATHAccuracy20.5LLaMA 65B (maj1@k)
Math Word Problem SolvingMATHParameters (Billions)65LLaMA 65B (maj1@k)
Math Word Problem SolvingMATHAccuracy15.2LLaMA 33B-maj1@k
Math Word Problem SolvingMATHParameters (Billions)33LLaMA 33B-maj1@k
Math Word Problem SolvingMATHAccuracy10.6LLaMA 65B
Math Word Problem SolvingMATHParameters (Billions)65LLaMA 65B
Math Word Problem SolvingMATHAccuracy8.8LLaMA 13B-maj1@k
Math Word Problem SolvingMATHParameters (Billions)13LLaMA 13B-maj1@k
Math Word Problem SolvingMATHAccuracy7.1LLaMA 33B
Math Word Problem SolvingMATHParameters (Billions)33LLaMA 33B
Math Word Problem SolvingMATHAccuracy6.9LLaMA 7B-maj1@k
Math Word Problem SolvingMATHParameters (Billions)7LLaMA 7B-maj1@k
Math Word Problem SolvingMATHAccuracy3.9LLaMA 13B
Math Word Problem SolvingMATHParameters (Billions)13LLaMA 13B
Math Word Problem SolvingMATHAccuracy2.9LLaMA 7B
Math Word Problem SolvingMATHParameters (Billions)7LLaMA 7B
Meta-LearningMedConceptsQAAccuracy25.653meta-llama/Meta-Llama-3-8B-Instruct
Mathematical Question AnsweringMATHAccuracy20.5LLaMA 65B (maj1@k)
Mathematical Question AnsweringMATHParameters (Billions)65LLaMA 65B (maj1@k)
Mathematical Question AnsweringMATHAccuracy15.2LLaMA 33B-maj1@k
Mathematical Question AnsweringMATHParameters (Billions)33LLaMA 33B-maj1@k
Mathematical Question AnsweringMATHAccuracy10.6LLaMA 65B
Mathematical Question AnsweringMATHParameters (Billions)65LLaMA 65B
Mathematical Question AnsweringMATHAccuracy8.8LLaMA 13B-maj1@k
Mathematical Question AnsweringMATHParameters (Billions)13LLaMA 13B-maj1@k
Mathematical Question AnsweringMATHAccuracy7.1LLaMA 33B
Mathematical Question AnsweringMATHParameters (Billions)33LLaMA 33B
Mathematical Question AnsweringMATHAccuracy6.9LLaMA 7B-maj1@k
Mathematical Question AnsweringMATHParameters (Billions)7LLaMA 7B-maj1@k
Mathematical Question AnsweringMATHAccuracy3.9LLaMA 13B
Mathematical Question AnsweringMATHParameters (Billions)13LLaMA 13B
Mathematical Question AnsweringMATHAccuracy2.9LLaMA 7B
Mathematical Question AnsweringMATHParameters (Billions)7LLaMA 7B
Multi-Task LearningMMLAverage (%)68.9LLaMA 65B (fine-tuned)
Multi-Task LearningMMLAverage (%)63.4LLaMA 65B (5-shot)
Multi-Task LearningMMLAverage (%)57.8LLaMA 33B (5-shot)
Mathematical ReasoningMATHAccuracy20.5LLaMA 65B (maj1@k)
Mathematical ReasoningMATHParameters (Billions)65LLaMA 65B (maj1@k)
Mathematical ReasoningMATHAccuracy15.2LLaMA 33B-maj1@k
Mathematical ReasoningMATHParameters (Billions)33LLaMA 33B-maj1@k
Mathematical ReasoningMATHAccuracy10.6LLaMA 65B
Mathematical ReasoningMATHParameters (Billions)65LLaMA 65B
Mathematical ReasoningMATHAccuracy8.8LLaMA 13B-maj1@k
Mathematical ReasoningMATHParameters (Billions)13LLaMA 13B-maj1@k
Mathematical ReasoningMATHAccuracy7.1LLaMA 33B
Mathematical ReasoningMATHParameters (Billions)33LLaMA 33B
Mathematical ReasoningMATHAccuracy6.9LLaMA 7B-maj1@k
Mathematical ReasoningMATHParameters (Billions)7LLaMA 7B-maj1@k
Mathematical ReasoningMATHAccuracy3.9LLaMA 13B
Mathematical ReasoningMATHParameters (Billions)13LLaMA 13B
Mathematical ReasoningMATHAccuracy2.9LLaMA 7B
Mathematical ReasoningMATHParameters (Billions)7LLaMA 7B
Sentence CompletionHellaSwagAccuracy84.2LLaMA 65B (0-shot)
Sentence CompletionHellaSwagAccuracy82.8LLaMA 33B (0-shot)
Sentence CompletionHellaSwagAccuracy79.2LLaMA 13B (0-shot)
Sentence CompletionHellaSwagAccuracy76.1LLaMA 7B (0-shot)
Stereotypical Bias AnalysisCrowS-PairsAge70.1LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsDisability66.7LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsGender70.6LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsNationality64.2LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsOverall66.6LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsPhysical Appearance77.8LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsRace/Color57LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsReligion70.6LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsSexual Orientation81LLaMA 65B
Stereotypical Bias AnalysisCrowS-PairsSocioeconomic status71.5LLaMA 65B
Arithmetic ReasoningGSM8KAccuracy69.7LLaMA 65B-maj1@k
Arithmetic ReasoningGSM8KParameters (Billion)65LLaMA 65B-maj1@k
Arithmetic ReasoningGSM8KAccuracy53.1LLaMA 33B-maj1@k
Arithmetic ReasoningGSM8KParameters (Billion)33LLaMA 33B-maj1@k
Arithmetic ReasoningGSM8KAccuracy50.9LLaMA 65B
Arithmetic ReasoningGSM8KParameters (Billion)65LLaMA 65B
Arithmetic ReasoningGSM8KAccuracy35.6LLaMA 33B
Arithmetic ReasoningGSM8KParameters (Billion)33LLaMA 33B
Arithmetic ReasoningGSM8KAccuracy29.3LLaMA 13B-maj1@k
Arithmetic ReasoningGSM8KParameters (Billion)13LLaMA 13B-maj1@k
Arithmetic ReasoningGSM8KAccuracy18.1LLaMA 7B (maj1@k)
Arithmetic ReasoningGSM8KParameters (Billion)7LLaMA 7B (maj1@k)
Arithmetic ReasoningGSM8KAccuracy17.8LLaMA 13B
Arithmetic ReasoningGSM8KParameters (Billion)13LLaMA 13B
Arithmetic ReasoningGSM8KAccuracy11LLaMA 7B
Arithmetic ReasoningGSM8KParameters (Billion)7LLaMA 7B

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17