ELMTEX Dataset
ELMTEX Dataset: Fine-Tuning Large Language Models for Structured Clinical Information Extraction
TabularCreative Commons Attribution 4.0 InternationalIntroduced 2025-02-03
We introduced a new dataset of clinical report summaries, annotated with structured information across 15 categories. This dataset was created to address the lack of large-scale resources for clinical IE. It also promotes the development of methods tailored to clinical data, helping to improve healthcare provision. The dataset contains 60, 000 annotated English clinical report summaries, from which we translated over 24, 000 examples into German.