PRMBench_Preview

TextsApache 2.0Introduced 2025-01-06

This is the official dataset for PRMBench. PRMBench is a benchmark dataset for evaluating process-level reward models (PRMs). It consists of 6,216 data instances, each containing a question, a solution process, and a modified process with errors. The dataset is designed to evaluate the ability of PRMs to identify fine-grained error types in the solution process. The dataset is annotated with error types and reasons for the errors, providing a comprehensive evaluation of PRMs.