SummEdits
Introduced 2023-05-23
SummEdits is a benchmark designed to measure the ability of Large Language Models (LLMs) to reason about facts and detect inconsistencies. It was proposed as a new protocol for inconsistency detection benchmark creation.
The SummEdits benchmark is implemented across 10 domains. It is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, with an estimated inter-annotator agreement of about 0.91.