Ask, Fail, Repeat: Meeseeks, an Iterative Feedback Benchmark for LLMs' Multi-turn Instruction-Following Ability

JiaMing Wang, Yunke Zhao, Peng Ding, Jun Kuang, ZongYu Wang, Xuezhi Cao, Xunliang Cai

2025-04-30Instruction Following Intent Recognition

Abstract

The ability to follow instructions accurately is fundamental for Large Language Models (LLMs) to serve as reliable agents in real-world applications. For complex instructions, LLMs often struggle to fulfill all requirements in a single attempt. In practice, users typically provide iterative feedback until the LLM generates a response that meets all requirements. However, existing instruction-following benchmarks are either single-turn or introduce new requirements in each turn without allowing self-correction. To address this gap, we propose Meeseeks. Meeseeks simulates realistic human-LLM interactions through an iterative feedback framework, which enables models to self-correct based on specific requirement failures in each turn, better reflecting real-world user-end usage patterns. Meanwhile, the benchmark implements a comprehensive evaluation system with 38 capability tags organized across three dimensions: Intent Recognition, Granular Content Validation, and Output Structure Validation. Through rigorous evaluation across LLMs, Meeseeks provides valuable insights into LLMs' instruction-following capabilities in multi-turn scenarios.

Related Papers

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning2025-07-17 How Many Instructions Can LLMs Follow at Once?2025-07-15 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering2025-07-15 Multilingual Multimodal Software Developer for Code Generation2025-07-11 TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data2025-07-08 ADMC: Attention-based Diffusion Model for Missing Modalities Feature Completion2025-07-08 DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment2025-07-03 Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks2025-07-03