TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic ...

CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

Han He, Qianchu Liu, Lei Xu, Chaitanya Shivade, Yi Zhang, Sundararajan Srinivasan, Katrin Kirchhoff

2024-10-03Text GenerationAbstractive Text SummarizationText SummarizationPrompt EngineeringHallucination
PaperPDFCode(official)

Abstract

Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively refined with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text. To address these challenges, we propose a novel multi-aspect Critique-Suggestion-guided automatic Prompt Optimization (CriSPO) approach. CriSPO introduces a critique-suggestion module as its core component. This module spontaneously discovers aspects, and compares generated and reference texts across these aspects, providing specific suggestions for prompt modification. These clear critiques and actionable suggestions guide a receptive optimizer module to make more substantial changes, exploring a broader and more effective search space. To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics. We evaluate CriSPO on 4 state-of-the-art LLMs across 4 summarization and 5 QA datasets. Extensive experiments show 3-4% ROUGE score improvement on summarization and substantial improvement of various metrics on QA. Code available at https://github.com/amazon-science/crispo

Results

TaskDatasetMetricValueModel
Text SummarizationSAMSumROUGE-147.2CriSPO 3-shot
Text SummarizationSAMSumROUGE-220.8CriSPO 3-shot
Text SummarizationSAMSumROUGE-L38.2CriSPO 3-shot
Text SummarizationACI-BenchROUGE-163.1CriSPO 3-shot
Text SummarizationACI-BenchROUGE-232.5CriSPO 3-shot
Text SummarizationACI-BenchROUGE-L41CriSPO 3-shot
Text SummarizationMeetingBankROUGE-246.5CriSPO 3-shot
Text SummarizationMeetingBankROUGE-L54.1CriSPO 3-shot
Text SummarizationMeetingBankRouge-158.5CriSPO 3-shot
Text SummarizationCNN / Daily MailROUGE-L27.4CriSPO 3-shot
Text SummarizationCNN/Daily MailROUGE-142.1CriSPO 3-shot
Text SummarizationCNN/Daily MailROUGE-217CriSPO 3-shot
Abstractive Text SummarizationCNN / Daily MailROUGE-L27.4CriSPO 3-shot
Abstractive Text SummarizationCNN/Daily MailROUGE-142.1CriSPO 3-shot
Abstractive Text SummarizationCNN/Daily MailROUGE-217CriSPO 3-shot

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17Leveraging Language Prior for Infrared Small Target Detection2025-07-17Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15