Papers With Code 2 | ML Benchmarks, SotA Results & Code

Syntax-Aware Fill-in-the-Middle (SAFIM) is a benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. SAFIM has three subtasks: Algorithmic Block Completion, Control-Flow Expression Completion, and API Function Call Completion. SAFIM is sourced from code submitted from April 2022 to January 2023 to minimize the impact of data contamination on evaluation results.

Authors: Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung
Paper: https://arxiv.org/abs/2403.04814
Huggingface Dataset: https://huggingface.co/datasets/gonglinyuan/safim
Leaderboard: https://safimbenchmark.com
Code & Submission Instructions: https://github.com/gonglinyuan/safim

The SAFIM benchmark is partially derived from problem descriptions and code solutions from https://codeforces.com. According to the license of CodeForces, you may publish the texts of Codeforces problems in any open sources, but you must preserve a direct link to the site.