Veysel Kocaman, David Talby
Named entity recognition (NER) is a widely applicable natural language processing task and building block of question answering, topic modeling, information retrieval, etc. In the medical domain, NER plays a crucial role by extracting meaningful chunks from clinical notes and reports, which are then fed to downstream tasks like assertion status detection, entity resolution, relation extraction, and de-identification. Reimplementing a Bi-LSTM-CNN-Char deep learning architecture on top of Apache Spark, we present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks without using heavy contextual embeddings like BERT. This includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6% gain), and JNLPBA to 81.29% (5.2% gain). In addition, this model is freely available within a production-grade code base as part of the open-source Spark NLP library; can scale up for training and inference in any Spark cluster; has GPU support and libraries for popular programming languages such as Python, R, Scala and Java; and can be extended to support other human languages with no code changes.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Named Entity Recognition (NER) | NCBI-disease | F1 | 89.13 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | NCBI-disease | F1 | 89.13 | Spark NLP |
| Named Entity Recognition (NER) | Species800 | F1 | 80.91 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | LINNAEUS | F1 | 86.26 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | LINNAEUS | F1 | 86.26 | Spark NLP |
| Named Entity Recognition (NER) | BioNLP13-CG | F1 | 85.58 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | BC5CDR-chemical | F1 | 94.88 | Spark NLP |
| Named Entity Recognition (NER) | AnatEM | F1 | 89.13 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | Species-800 | F1 | 80.91 | Spark NLP |
| Named Entity Recognition (NER) | BC4CHEMD | F1 | 93.72 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | BC2GM | F1 | 88.75 | Spark NLP |
| Named Entity Recognition (NER) | BC5CDR | F1 | 89.73 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | BC5CDR | F1 | 89.73 | Spark NLP |
| Named Entity Recognition (NER) | JNLPBA | F1 | 81.29 | BLSTM-CNN-Char (SparkNLP) |
| Named Entity Recognition (NER) | JNLPBA | F1 | 81.29 | Spark NLP |