CodeInstruct

InstructCoder, CodeInstruct

TextsIntroduced 2023-05-23

InstructCoder is the first dataset designed to adapt LLMs for general code editing. It consists of over 100k instruction-input-output triplets and covers multiple distinct code editing scenarios, generated by ChatGPT. LLaMA-33B finetuned on InstructCoder performs on par with ChatGPT on a real-world test set derived from GitHub commits.