ComplexCodeEval

Introduced 2024-09-16

ComplexCodeEval

ComplexCodeEval is an evaluation benchmark designed to accommodate multiple downstream tasks, accurately reflect different programming environments, and deliberately avoid data leakage issues. This benchmark includes a diverse set of samples from real-world projects, aiming to closely mirror actual development scenarios.

Overview

ComplexCodeEval consists of:

  • 3,897 Java samples from 1,055 code repositories
  • 7,184 Python samples from 2,107 code repositories

Key Features

  1. Diverse Downstream Tasks: The benchmark supports multiple downstream tasks to evaluate the performance of different code analysis tools and models.
  2. Accurate Reflection of Programming Environments: Samples are selected from projects that use popular third-party frameworks and packages.
  3. Avoidance of Data Leakage: Incorporates multiple timestamps for each sample to prevent data leakage.