CAPO: Cost-Aware Prompt Optimization

Tom Zehle, Moritz Schlager, Timo Heiß, Matthias Feurer

2025-04-22Text Classification Subjectivity Analysis Sentiment Analysis AutoML Arithmetic Reasoning

Abstract

Large language models (LLMs) have revolutionized natural language processing by solving a wide range of tasks simply guided by a prompt. Yet their performance is highly sensitive to prompt formulation. While automated prompt optimization addresses this challenge by finding optimal prompts, current methods require a substantial number of LLM calls and input tokens, making prompt optimization expensive. We introduce CAPO (Cost-Aware Prompt Optimization), an algorithm that enhances prompt optimization efficiency by integrating AutoML techniques. CAPO is an evolutionary approach with LLMs as operators, incorporating racing to save evaluations and multi-objective optimization to balance performance with prompt length. It jointly optimizes instructions and few-shot examples while leveraging task descriptions for improved robustness. Our extensive experiments across diverse datasets and LLMs demonstrate that CAPO outperforms state-of-the-art discrete prompt optimization methods in 11/15 cases with improvements up to 21%p. Our algorithm achieves better performances already with smaller budgets, saves evaluations through racing, and decreases average prompt length via a length penalty, making it both cost-efficient and cost-aware. Even without few-shot examples, CAPO outperforms its competitors and generally remains robust to initial prompts. CAPO represents an important step toward making prompt optimization more powerful and accessible by improving cost-efficiency.

Results

Task	Dataset	Metric	Value	Model
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	62.27	Llama-3.3-70B + CAPO
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	60.2	Mistral-Small-24B + CAPO
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	59.07	Qwen2.5-32B + CAPO
Subjectivity Analysis	SUBJ	Accuracy	91.6	Llama-3.3-70B + CAPO
Subjectivity Analysis	SUBJ	Accuracy	91	Qwen2.5-32B + CAPO
Subjectivity Analysis	SUBJ	Accuracy	81.67	Mistral-Small-24B + CAPO
Text Classification	Bala-Copa	Accuracy	98.47	Qwen2.5-32B + CAPO
Text Classification	Bala-Copa	Accuracy	98.27	Llama-3.3-70B + CAPO
Text Classification	Bala-Copa	Accuracy	95.13	Mistral-Small-24B + CAPO
Text Classification	AG News	Error	11.2	Llama-3.3-70B + CAPO
Text Classification	AG News	Error	12.93	Qwen2.5-32B + CAPO
Text Classification	AG News	Error	15.7	Mistral-Small-24B + CAPO
Classification	Bala-Copa	Accuracy	98.47	Qwen2.5-32B + CAPO
Classification	Bala-Copa	Accuracy	98.27	Llama-3.3-70B + CAPO
Classification	Bala-Copa	Accuracy	95.13	Mistral-Small-24B + CAPO
Classification	AG News	Error	11.2	Llama-3.3-70B + CAPO
Classification	AG News	Error	12.93	Qwen2.5-32B + CAPO
Classification	AG News	Error	15.7	Mistral-Small-24B + CAPO
Arithmetic Reasoning	GSM8K	Accuracy	73.73	Llama-3.3-70B + CAPO
Arithmetic Reasoning	GSM8K	Accuracy	65.07	Mistral-Small-24B + CAPO
Arithmetic Reasoning	GSM8K	Accuracy	60.2	Qwen2.5-32B + CAPO

CAPO: Cost-Aware Prompt Optimization

Abstract

Results

Related Papers

CAPO: Cost-Aware Prompt Optimization

Abstract

Results

Related Papers