Robust Summarization Evaluation Benchmark

TextsIntroduced 2022-12-15

Robust Summarization Evaluation Benchmark is a large human evaluation dataset consisting of over 22k summary-level annotations over state-of-the-art systems on three datasets.

Source: Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation