TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Clinical Document Understanding on COVID-19 Rese...

Improving Clinical Document Understanding on COVID-19 Research with Spark NLP

Veysel Kocaman, David Talby

2020-12-07Clinical Assertion Status DetectionAnatomydocument understandingnamed-entity-recognitionNamed Entity RecognitionClinical Concept ExtractionNamed Entity Recognition (NER)
PaperPDFCode(official)

Abstract

Following the global COVID-19 pandemic, the number of scientific papers studying the virus has grown massively, leading to increased interest in automated literate review. We present a clinical text mining system that improves on previous efforts in three ways. First, it can recognize over 100 different entity types including social determinants of health, anatomy, risk factors, and adverse events in addition to other commonly used clinical and biomedical entities. Second, the text processing pipeline includes assertion status detection, to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient. Third, the deep learning models used are more accurate than previously available, leveraging an integrated pipeline of state-of-the-art pretrained named entity recognition models, and improving on the previous best performing benchmarks for assertion status detection. We illustrate extracting trends and insights, e.g. most frequent disorders and symptoms, and most common vital signs and EKG findings, from the COVID-19 Open Research Dataset (CORD-19). The system is built using the Spark NLP library which natively supports scaling to use distributed clusters, leveraging GPUs, configurable and reusable NLP pipelines, healthcare specific embeddings, and the ability to train models to support new entity types or human languages with no code changes.

Results

TaskDatasetMetricValueModel
Clinical Assertion Status Detection2010 i2b2/VAMicro F10.939BiLSTM (SparkNLP)

Related Papers

Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?2025-07-15A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14PSAT: Pediatric Segmentation Approaches via Adult Augmentations and Transfer Learning2025-07-08PaddleOCR 3.0 Technical Report2025-07-08Flippi: End To End GenAI Assistant for E-Commerce2025-07-08SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model2025-07-07Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation2025-07-04Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images2025-07-04