TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Skills-in-Context Prompting: Unlocking Compositionality in...

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen

2023-08-01Mathematical ReasoningMathMath Word Problem SolvingSystematic Generalization
PaperPDF

Abstract

We investigate how to elicit compositional generalization capabilities in large language models (LLMs). Compositional generalization empowers LLMs to solve complex problems by combining foundational skills, a critical reasoning ability akin to human intelligence. However, even the most advanced LLMs currently struggle with this form of reasoning. We examine this problem within the framework of in-context learning and find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We refer to this prompt structure as skills-in-context (SKiC). With as few as two exemplars, this in-context learning structure enables LLMs to tackle more challenging problems requiring innovative skill combinations, achieving near-perfect systematic generalization across a broad range of tasks. Intriguingly, SKiC also unlocks the latent potential of LLMs, allowing them to more actively utilize pre-existing internal skills acquired during earlier pretraining stages to solve complex reasoning problems. The SKiC structure is robust across different skill constructions and exemplar choices and demonstrates strong transferability to new tasks. Finally, inspired by our in-context learning study, we show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization, enabling the models to solve much harder problems directly with standard prompting.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy56.4SKiC (GPT-4 model)
Math Word Problem SolvingMATHAccuracy56.4SKiC (GPT-4 model)
Mathematical Question AnsweringMATHAccuracy56.4SKiC (GPT-4 model)
Mathematical ReasoningMATHAccuracy56.4SKiC (GPT-4 model)

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14