ComplexCodeEval
Introduced 2024-09-16
ComplexCodeEval
ComplexCodeEval is an evaluation benchmark designed to accommodate multiple downstream tasks, accurately reflect different programming environments, and deliberately avoid data leakage issues. This benchmark includes a diverse set of samples from real-world projects, aiming to closely mirror actual development scenarios.
Overview
ComplexCodeEval consists of:
- 3,897 Java samples from 1,055 code repositories
- 7,184 Python samples from 2,107 code repositories
Key Features
- Diverse Downstream Tasks: The benchmark supports multiple downstream tasks to evaluate the performance of different code analysis tools and models.
- Accurate Reflection of Programming Environments: Samples are selected from projects that use popular third-party frameworks and packages.
- Avoidance of Data Leakage: Incorporates multiple timestamps for each sample to prevent data leakage.