Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

2020-07-31Text Classification Participant Intervention Comparison Outcome Extraction Question Answering Relation Extraction Sentence Similarity named-entity-recognition Named Entity Recognition Continual Pretraining NER Document Classification Drug–drug Interaction Extraction Named Entity Recognition (NER)Language Modelling

Paper PDF Code Code

Abstract

Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly-available datasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modeling choices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary with BERT models, such as using complex tagging schemes in named entity recognition (NER). To help accelerate research in biomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created a leaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) at https://aka.ms/BLURB.

Results

Task	Dataset	Metric	Value	Model
Relation Extraction	GAD	Micro F1	82.34	PubMedBERT uncased
Relation Extraction	DDI	Micro F1	82.36	PubMedBERT uncased
Relation Extraction	ChemProt	Micro F1	77.24	PubMedBERT uncased
Question Answering	BLURB	Accuracy	71.7	PubMedBERT (uncased; abstracts)
Question Answering	PubMedQA	Accuracy	55.84	PubMedBERT uncased
Question Answering	BioASQ	Accuracy	87.56	PubMedBERT uncased
Information Extraction	DDI extraction 2013 corpus	F1	0.8236	PubMedBERT
Information Extraction	DDI extraction 2013 corpus	Micro F1	82.36	PubMedBERT
Information Extraction	EBM-NLP	F1	73.38	PubMedBERT uncased
Named Entity Recognition (NER)	NCBI-disease	F1	87.82	PubMedBERT uncased
Named Entity Recognition (NER)	BC2GM	F1	84.52	PubMedBERT uncased
Named Entity Recognition (NER)	JNLPBA	F1	79.1	PubMedBERT uncased
Text Classification	BLURB	F1	82.32	PubMedBERT (uncased; abstracts)
Text Classification	HOC	Micro F1	82.32	PubMedBERT uncased
Participant Intervention Comparison Outcome Extraction	EBM-NLP	F1	73.38	PubMedBERT uncased
Document Classification	HOC	Micro F1	82.32	PubMedBERT uncased
Biomedical Information Retrieval	EBM PICO	Macro F1 word level	73.38	PubMedBERT uncased
Classification	BLURB	F1	82.32	PubMedBERT (uncased; abstracts)
Classification	HOC	Micro F1	82.32	PubMedBERT uncased

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Abstract

Results

Related Papers

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Abstract

Results

Related Papers