TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BioSentVec: creating sentence embeddings for biomedical te...

BioSentVec: creating sentence embeddings for biomedical texts

Qingyu Chen, Yifan Peng, Zhiyong Lu

2018-10-22Sentence Embeddings For Biomedical TextsBenchmarkingSentence EmbeddingsWord Embeddings
PaperPDFCode(official)CodeCode(official)Code

Abstract

Sentence embeddings have become an essential part of today's natural language processing (NLP) systems, especially together advanced deep learning methods. Although pre-trained sentence encoders are available in the general domain, none exists for biomedical texts to date. In this work, we introduce BioSentVec: the first open set of sentence embeddings trained with over 30 million documents from both scholarly articles in PubMed and clinical notes in the MIMIC-III Clinical Database. We evaluate BioSentVec embeddings in two sentence pair similarity tasks in different text genres. Our benchmarking results demonstrate that the BioSentVec embeddings can better capture sentence semantics compared to the other competitive alternatives and achieve state-of-the-art performance in both tasks. We expect BioSentVec to facilitate the research and development in biomedical text mining and to complement the existing resources in biomedical word embeddings. BioSentVec is publicly available at https://github.com/ncbi-nlp/BioSentVec

Results

TaskDatasetMetricValueModel
Sentence EmbeddingsBIOSSESPearson Correlation0.817BioSentVec (PubMed)
Sentence EmbeddingsBIOSSESPearson Correlation0.795BioSentVec (PubMed + MIMIC-III)
Sentence EmbeddingsBIOSSESPearson Correlation0.35BioSentVec (MIMIC-III)
Sentence EmbeddingsBIOSSESPearson Correlation0.345Universal Sentence Encoder
Sentence EmbeddingsMedSTSPearson Correlation0.767BioSentVec (PubMed + MIMIC-III)
Sentence EmbeddingsMedSTSPearson Correlation0.759BioSentVec (MIMIC-III)
Sentence EmbeddingsMedSTSPearson Correlation0.75BioSentVec (PubMed)
Sentence EmbeddingsMedSTSPearson Correlation0.714Universal Sentence Encoder
Representation LearningBIOSSESPearson Correlation0.817BioSentVec (PubMed)
Representation LearningBIOSSESPearson Correlation0.795BioSentVec (PubMed + MIMIC-III)
Representation LearningBIOSSESPearson Correlation0.35BioSentVec (MIMIC-III)
Representation LearningBIOSSESPearson Correlation0.345Universal Sentence Encoder
Representation LearningMedSTSPearson Correlation0.767BioSentVec (PubMed + MIMIC-III)
Representation LearningMedSTSPearson Correlation0.759BioSentVec (MIMIC-III)
Representation LearningMedSTSPearson Correlation0.75BioSentVec (PubMed)
Representation LearningMedSTSPearson Correlation0.714Universal Sentence Encoder

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment2025-07-20Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15