Automatic Model Selection with Large Language Models for Reasoning

James Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, Michael Qizhe Xie

2023-05-23Math Word Problem Solving Model Selection GSM8K Large Language Model Arithmetic Reasoning Language Modelling

Abstract

Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths. CoT employs natural language, offering flexibility and interpretability, while PAL utilizes programming language, yielding more structured and rigorous logic. We introduce a model selection method to combine the best of both worlds by employing a large language model (LLM) to dynamically select between them. Our theoretical analysis underscores the feasibility of this method, which is further corroborated by empirical results. Our proposed method demonstrates significant performance improvements across eight reasoning datasets with Codex, ChatGPT, and GPT-4. Additionally, our method is complementary to self-consistency; when integrated, it can further enhance performance while significantly reducing computation costs. Moreover, we achieve new state-of-the-art results on GSM8K and SVAMP, with respective accuracies of 96.8% and 93.7%. Our code, data and prompts are available at https://github.com/XuZhao0/Model-Selection-Reasoning

Results

Task	Dataset	Metric	Value	Model
Question Answering	SVAMP	Execution Accuracy	93.7	GPT-4 (Model Selection)
Math Word Problem Solving	SVAMP	Execution Accuracy	93.7	GPT-4 (Model Selection)
Mathematical Question Answering	SVAMP	Execution Accuracy	93.7	GPT-4 (Model Selection)
Mathematical Reasoning	SVAMP	Execution Accuracy	93.7	GPT-4 (Model Selection)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18 GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems2025-07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17 Making Language Model a Hierarchical Classifier and Generator2025-07-17