TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rare Disease Identification from Clinical Notes with Ontol...

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision

Hang Dong, Víctor Suárez-Paniagua, Huayu Zhang, Minhong Wang, Emma Whitfield, Honghan Wu

2021-05-05Entity Linking
PaperPDFCode(official)

Abstract

The identification of rare diseases from clinical notes with Natural Language Processing (NLP) is challenging due to the few cases available for machine learning and the need of data annotation from clinical experts. We propose a method using ontologies and weak supervision. The approach includes two steps: (i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak supervision based on customised rules and Bidirectional Encoder Representations from Transformers (BERT) based contextual representations, and (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). Using MIMIC-III US intensive care discharge summaries as a case study, we show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts. Our analysis shows that the overall pipeline processing discharge summaries can surface rare disease cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.

Results

TaskDatasetMetricValueModel
Entity LinkingRare Diseases Mentions in MIMIC-III (Text-to-UMLS)F10.858SemEHR+WS (rules+BlueBERT)
Entity LinkingRare Diseases Mentions in MIMIC-IIIF10.702SemEHR+WS (rules+BlueBERT)

Related Papers

LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World2025-06-01Distilling Closed-Source LLM's Knowledge for Locally Stable and Economic Biomedical Entity Linking2025-05-26Evaluation of LLMs on Long-tail Entity Linking in Historical Documents2025-05-06KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking2025-04-21Cross-Document Contextual Coreference Resolution in Knowledge Graphs2025-04-08Explainable ICD Coding via Entity Linking2025-03-26Entity-aware Cross-lingual Claim Detection for Automated Fact-checking2025-03-19Leveraging Knowledge Graphs and LLMs for Context-Aware Messaging2025-03-12