LegalCore

TextsIntroduced 2025-02-18

Recognizing events and their coreferential men- tions in a document is essential for understand- ing semantic meanings of text. The existing re- search on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference in- formation. The legal contract documents we an- notated in this dataset are several times longer than news articles, with an average length of around 25k tokens per document. The anno- tations show that legal documents have dense event mentions and feature both short-distance and super long-distance coreference links be- tween event mentions. We further benchmark mainstream Large Language Models (LLMs) on this dataset for both event identification and event coreference resolution tasks, and find that this dataset poses significant challenges for both open-source and proprietary LLMs, which all perform significantly worse than a su- pervised baseline.