TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Orca 2: Teaching Small Language Models How to Reason

Orca 2: Teaching Small Language Models How to Reason

Arindam Mitra, Luciano del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah

2023-11-18Reading ComprehensionQuestion AnsweringMathematical ReasoningMulti-task Language UnderstandingCounterfactual ReasoningImitation LearningCommon Sense ReasoningCrass AIArithmetic Reasoning
PaperPDF

Abstract

Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at aka.ms/orca-lm to support research on the development, evaluation, and alignment of smaller LMs

Results

TaskDatasetMetricValueModel
Reading ComprehensionRACEAccuracy82.87Orca 2-13B
Reading ComprehensionRACEAccuracy80.79Orca 2-7B
Transfer LearningBBH-nlpAverage (%)50.18Orca 2-13B
Transfer LearningBBH-nlpAverage (%)45.93Orca 2-7B
Question AnsweringDROP TestF160.26Orca 2-7B
Question AnsweringDROP TestF157.97Orca 2-13B
Question AnsweringAGI EvalAccuracy49.93Orca 2-13B
Question AnsweringAGI EvalAccuracy45.1Orca 2-7B
Common Sense ReasoningBIG-benchAccuracy86.86Orca 2-13B
Common Sense ReasoningBIG-benchAccuracy84.31Orca 2-7B
Multi-Task LearningBBH-nlpAverage (%)50.18Orca 2-13B
Multi-Task LearningBBH-nlpAverage (%)45.93Orca 2-7B
Arithmetic ReasoningGSM8KAccuracy59.14Orca 2 13B
Arithmetic ReasoningGSM8KParameters (Billion)13Orca 2 13B
Arithmetic ReasoningGSM8KAccuracy47.23Orca 2 7B
Arithmetic ReasoningGSM8KParameters (Billion)7Orca 2 7B

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner2025-07-17Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17