RES-Q

RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale

TextsMITIntroduced 2024-06-24

RES-Q is a natural language instruction-based benchmark for evaluating R\textbf{R}epository E\textbf{E}diting S\textbf{S}ystems, which consists of 100 handcrafted repository editing tasks derived from real GitHub commits. Given an edit instruction and a code repository, RES-Q evaluates an LLM system’s ability to interpret edit instructions, gather information, and construct appropriate edits to the repository.