BigCodeBench
Apache License 2.0Introduced 2024-06-22
BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks¹. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting¹. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls¹.
Here are some key features of BigCodeBench:
- Precise evaluation & ranking: It provides a leaderboard for latest LLM rankings before & after rigorous evaluation¹.
- Pre-generated samples: BigCodeBench accelerates code intelligence research by open-sourcing LLM-generated samples for various models¹.
- Execution Environment: The execution environment in BigCodeBench is less bounded than EvalPlus to support tasks with diverse library dependencies¹.
- Test Evaluation: BigCodeBench relies on unittest for evaluating the generated code¹.
(1) GitHub - bigcode-project/bigcodebench: BigCodeBench: The Next .... https://github.com/bigcode-project/bigcodebench/.