RES-Q

RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale

TextsMITIntroduced 2024-06-24

RES-Q is a natural language instruction-based benchmark for evaluating $\textbf{R}$ epository $\textbf{E}$ diting $\textbf{S}$ ystems, which consists of 100 handcrafted repository editing tasks derived from real GitHub commits. Given an edit instruction and a code repository, RES-Q evaluates an LLM system’s ability to interpret edit instructions, gather information, and construct appropriate edits to the repository.

Benchmarks

Code Generation/pass@1