GEM
Generation, Evaluation, and Metrics
TextsIntroduced 2021-02-02
Generation, Evaluation, and Metrics (GEM) is a benchmark environment for Natural Language Generation with a focus on its Evaluation, both through human annotations and automated Metrics.
GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.
It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development by extending existing data or developing datasets for additional languages.
Source: https://gem-benchmark.com/ Image Source: Gehrmann et al
Related Benchmarks
GEM-XSum/Extreme Summarization/BLEU scoreGEM-XSum/Extreme Summarization/ParametersGEM-XSum/Extreme Summarization/ROUGE-2GEMBench/Robot Manipulation/Average Success RateGEMBench/Robot Manipulation/Average Success Rate (L1)GEMBench/Robot Manipulation/Average Success Rate (L2)GEMBench/Robot Manipulation/Average Success Rate (L3)GEMBench/Robot Manipulation/Average Success Rate (L4)