TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CAPO: Cost-Aware Prompt Optimization

CAPO: Cost-Aware Prompt Optimization

Tom Zehle, Moritz Schlager, Timo Heiß, Matthias Feurer

2025-04-22Text ClassificationSubjectivity AnalysisSentiment AnalysisAutoMLArithmetic Reasoning
PaperPDFCode(official)Code(official)

Abstract

Large language models (LLMs) have revolutionized natural language processing by solving a wide range of tasks simply guided by a prompt. Yet their performance is highly sensitive to prompt formulation. While automated prompt optimization addresses this challenge by finding optimal prompts, current methods require a substantial number of LLM calls and input tokens, making prompt optimization expensive. We introduce CAPO (Cost-Aware Prompt Optimization), an algorithm that enhances prompt optimization efficiency by integrating AutoML techniques. CAPO is an evolutionary approach with LLMs as operators, incorporating racing to save evaluations and multi-objective optimization to balance performance with prompt length. It jointly optimizes instructions and few-shot examples while leveraging task descriptions for improved robustness. Our extensive experiments across diverse datasets and LLMs demonstrate that CAPO outperforms state-of-the-art discrete prompt optimization methods in 11/15 cases with improvements up to 21%p. Our algorithm achieves better performances already with smaller budgets, saves evaluations through racing, and decreases average prompt length via a length penalty, making it both cost-efficient and cost-aware. Even without few-shot examples, CAPO outperforms its competitors and generally remains robust to initial prompts. CAPO represents an important step toward making prompt optimization more powerful and accessible by improving cost-efficiency.

Results

TaskDatasetMetricValueModel
Sentiment AnalysisSST-5 Fine-grained classificationAccuracy62.27Llama-3.3-70B + CAPO
Sentiment AnalysisSST-5 Fine-grained classificationAccuracy 60.2Mistral-Small-24B + CAPO
Sentiment AnalysisSST-5 Fine-grained classificationAccuracy 59.07Qwen2.5-32B + CAPO
Subjectivity AnalysisSUBJAccuracy91.6Llama-3.3-70B + CAPO
Subjectivity AnalysisSUBJAccuracy91Qwen2.5-32B + CAPO
Subjectivity AnalysisSUBJAccuracy81.67Mistral-Small-24B + CAPO
Text ClassificationBala-CopaAccuracy98.47Qwen2.5-32B + CAPO
Text ClassificationBala-CopaAccuracy98.27Llama-3.3-70B + CAPO
Text ClassificationBala-CopaAccuracy95.13Mistral-Small-24B + CAPO
Text ClassificationAG NewsError11.2Llama-3.3-70B + CAPO
Text ClassificationAG NewsError12.93Qwen2.5-32B + CAPO
Text ClassificationAG NewsError15.7Mistral-Small-24B + CAPO
ClassificationBala-CopaAccuracy98.47Qwen2.5-32B + CAPO
ClassificationBala-CopaAccuracy98.27Llama-3.3-70B + CAPO
ClassificationBala-CopaAccuracy95.13Mistral-Small-24B + CAPO
ClassificationAG NewsError11.2Llama-3.3-70B + CAPO
ClassificationAG NewsError12.93Qwen2.5-32B + CAPO
ClassificationAG NewsError15.7Mistral-Small-24B + CAPO
Arithmetic ReasoningGSM8KAccuracy73.73Llama-3.3-70B + CAPO
Arithmetic ReasoningGSM8KAccuracy65.07Mistral-Small-24B + CAPO
Arithmetic ReasoningGSM8KAccuracy60.2Qwen2.5-32B + CAPO

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Imbalanced Regression Pipeline Recommendation2025-07-16AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08