TriBERT

TextsMITIntroduced 2023-07-23

TriBERT dataset consists of 12,049 training, 2,527 validation and 2,560 test Human-Machine collaborative texts. Each text contains both human-written and LLM-generated parts, which can appear in different orders (human → AI, AI → human). Therefore, each sample has between 1 and 3 boundaries, indicating the sentences where authorship changes. The texts were created using humanwritten essays with LLM-generated sections added using ChatGPT.

Related Benchmarks

TriBERT (in-domain)/Boundary Detection/F1@3