Long N. Phan, James T. Anibal, Hieu Tran, Shaurya Chanana, Erol Bahadroglu, Alec Peltekian, Grégoire Altan-Bonnet
In this report, we introduce SciFive, a domain-specific T5 model that has been pre-trained on large biomedical corpora. Our model outperforms the current SOTA methods (i.e. BERT, BioBERT, Base T5) on tasks in named entity relation, relation extraction, natural language inference, and question-answering. We show that text-generation methods have significant potential in a broad array of biomedical NLP tasks, particularly those requiring longer, more complex outputs. Our results support the exploration of more difficult text generation tasks and the development of new methods in this area
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Relation Extraction | ChemProt | F1 | 78 | SciFive Large |
| Relation Extraction | ChemProt | F1 | 77.4 | BioT5X (base) |
| Natural Language Inference | MedNLI | Accuracy | 86.57 | SciFive-large |
| Natural Language Inference | MedNLI | Params (M) | 738 | SciFive-large |
| Information Extraction | DDI extraction 2013 corpus | F1 | 0.8367 | SciFive-large |
| Information Extraction | DDI extraction 2013 corpus | Micro F1 | 83.67 | SciFive-large |
| Named Entity Recognition (NER) | NCBI-disease | F1 | 89.39 | SciFive-Base |
| Named Entity Recognition (NER) | BC5CDR-chemical | F1 | 94.76 | SciFive-Large |
| Named Entity Recognition (NER) | BC5CDR-disease | F1 | 87.62 | SciFive-Large |
| Named Entity Recognition (NER) | Species-800 | F1 | 76.55 | SciFive-Base |
| Named Entity Recognition (NER) | JNLPBA | F1 | 77.55 | SciFive-Large |
| Text Classification | HOC | F1 | 86.08 | SciFive-large |
| Document Classification | HOC | F1 | 86.08 | SciFive-large |
| Classification | HOC | F1 | 86.08 | SciFive-large |