TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Biomedical Named Entity Recognition at Scale

Biomedical Named Entity Recognition at Scale

Veysel Kocaman, David Talby

2020-11-12Medical Named Entity RecognitionQuestion AnsweringRelation ExtractionDe-identificationEntity Resolutionnamed-entity-recognitionNamed Entity RecognitionNERInformation RetrievalRetrievalNamed Entity Recognition (NER)
PaperPDFCode(official)

Abstract

Named entity recognition (NER) is a widely applicable natural language processing task and building block of question answering, topic modeling, information retrieval, etc. In the medical domain, NER plays a crucial role by extracting meaningful chunks from clinical notes and reports, which are then fed to downstream tasks like assertion status detection, entity resolution, relation extraction, and de-identification. Reimplementing a Bi-LSTM-CNN-Char deep learning architecture on top of Apache Spark, we present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks without using heavy contextual embeddings like BERT. This includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6% gain), and JNLPBA to 81.29% (5.2% gain). In addition, this model is freely available within a production-grade code base as part of the open-source Spark NLP library; can scale up for training and inference in any Spark cluster; has GPU support and libraries for popular programming languages such as Python, R, Scala and Java; and can be extended to support other human languages with no code changes.

Results

TaskDatasetMetricValueModel
Named Entity Recognition (NER)NCBI-diseaseF189.13BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)NCBI-diseaseF189.13Spark NLP
Named Entity Recognition (NER)Species800F180.91BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)LINNAEUSF186.26BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)LINNAEUSF186.26Spark NLP
Named Entity Recognition (NER)BioNLP13-CGF185.58BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)BC5CDR-chemicalF194.88Spark NLP
Named Entity Recognition (NER)AnatEMF189.13BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)Species-800F180.91Spark NLP
Named Entity Recognition (NER)BC4CHEMDF193.72BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)BC2GMF188.75Spark NLP
Named Entity Recognition (NER)BC5CDRF189.73BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)BC5CDRF189.73Spark NLP
Named Entity Recognition (NER)JNLPBAF181.29BLSTM-CNN-Char (SparkNLP)
Named Entity Recognition (NER)JNLPBAF181.29Spark NLP

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17