HAVOC

Harmful Abstractions and Violations in Open Completions Benchmark

TextsApache 2.0Introduced 2025-05-04

measure the toxicity generated by language models across input severity and harm categories, by creating a new benchmark of open ended prefixes. We sampled 10,376 snippets from web pages across the dimensions and harms as described in the paper - https://arxiv.org/pdf/2505.02009.