ComplexCodeEval

ComplexCodeEval is an evaluation benchmark designed to accommodate multiple downstream tasks, accurately reflect different programming environments, and deliberately avoid data leakage issues. This benchmark includes a diverse set of samples from real-world projects, aiming to closely mirror actual development scenarios.

Overview

ComplexCodeEval consists of:

3,897 Java samples from 1,055 code repositories
7,184 Python samples from 2,107 code repositories

Key Features

Diverse Downstream Tasks: The benchmark supports multiple downstream tasks to evaluate the performance of different code analysis tools and models.
Accurate Reflection of Programming Environments: Samples are selected from projects that use popular third-party frameworks and packages.
Avoidance of Data Leakage: Incorporates multiple timestamps for each sample to prevent data leakage.