Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed
We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Transfer Learning | MML | Average (%) | 60.1 | Mistral 7B (5-shot) |
| Question Answering | PeerQA | AlignScore | 0.0827 | Mistral-v02-7B-32k |
| Question Answering | PeerQA | Prometheus-2 Answer Correctness | 3.4245 | Mistral-v02-7B-32k |
| Question Answering | PeerQA | Rouge-L | 0.1922 | Mistral-v02-7B-32k |
| Question Answering | Natural Questions | EM | 28.8 | Mistral 7B (5-shot) |
| Question Answering | PIQA | Accuracy | 83 | Mistral 7B (0-shot) |
| Question Answering | TriviaQA | EM | 69.9 | Mistral 7B (5-shot) |
| Question Answering | NExT-QA | Accuracy | 51.1 | Mistral (7B) |
| Question Answering | NExT-GQA | Acc@GQA | 9.2 | Mistral (7B) |
| Question Answering | IntentQA | Accuracy | 50.4 | Mistral (7B) |
| Question Answering | MATH | Accuracy | 13.1 | Mistral 7B (maj@4) |
| Video Question Answering | NExT-QA | Accuracy | 51.1 | Mistral (7B) |
| Video Question Answering | NExT-GQA | Acc@GQA | 9.2 | Mistral (7B) |
| Video Question Answering | IntentQA | Accuracy | 50.4 | Mistral (7B) |
| Code Generation | MBPP | Accuracy | 47.5 | Mistral 7B (3-shot) |
| Common Sense Reasoning | WinoGrande | Accuracy | 75.3 | Mistral 7B (0-shot) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 55.5 | Mistral 7B (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 80 | Mistral 7B (0-shot) |
| Math Word Problem Solving | MATH | Accuracy | 13.1 | Mistral 7B (maj@4) |
| Mathematical Question Answering | MATH | Accuracy | 13.1 | Mistral 7B (maj@4) |
| Multi-Task Learning | MML | Average (%) | 60.1 | Mistral 7B (5-shot) |
| Mathematical Reasoning | MATH | Accuracy | 13.1 | Mistral 7B (maj@4) |
| Sentence Completion | HellaSwag | Accuracy | 81.3 | Mistral 7B (0-shot) |
| Arithmetic Reasoning | GSM8K | Accuracy | 52.2 | Mistral 7B (maj@8) |
| Arithmetic Reasoning | GSM8K | Parameters (Billion) | 7 | Mistral 7B (maj@8) |
| answerability prediction | PeerQA | Macro F1 | 0.4703 | Mistral-IT-v02-7B-32k |