SummEdits

Introduced 2023-05-23

SummEdits is a benchmark designed to measure the ability of Large Language Models (LLMs) to reason about facts and detect inconsistencies. It was proposed as a new protocol for inconsistency detection benchmark creation.

The SummEdits benchmark is implemented across 10 domains. It is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, with an estimated inter-annotator agreement of about 0.91.