CLDR: Contrastive Learning Drug Response Models from Natural Language Supervision

Kun Li, Wenbin Hu

2023-12-17regression Common Sense Reasoning Drug Discovery Drug Response Prediction Contrastive Learning

Abstract

Deep learning-based drug response prediction (DRP) methods can accelerate the drug discovery process and reduce R\&D costs. Although the mainstream methods achieve high accuracy in predicting response regression values, the regression-aware representations of these methods are fragmented and fail to capture the continuity of the sample order. This phenomenon leads to models optimized to sub-optimal solution spaces, reducing generalization ability and may result in significant wasted costs in the drug discovery phase. In this paper, we propose \MN, a contrastive learning framework with natural language supervision for the DRP. The \MN~converts regression labels into text, which is merged with the captions text of the drug response as a second modality of the samples compared to the traditional modalities (graph, sequence). In each batch, two modalities of one sample are considered positive pairs and the other pairs are considered negative pairs. At the same time, in order to enhance the continuous representation capability of the numerical text, a common-sense numerical knowledge graph is introduced. We validated several hundred thousand samples from the Genomics of Drug Sensitivity in Cancer dataset, observing the average improvement of the DRP method ranges from 7.8\% to 31.4\% with the application of our framework. The experiments prove that the \MN~effectively constrains the samples to a continuous distribution in the representation space, and achieves impressive prediction performance with only a few epochs of fine-tuning after pre-training. The code is available at: \url{https://gitee.com/xiaoyibang/clipdrug.git}.

Results

Task	Dataset	Metric	Value	Model
Drug Discovery	GDSC	Pearson correlation coefficient (PCC)	0.536	CLDR

Related Papers

Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20 Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17 Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts2025-07-16 Imbalanced Regression Pipeline Recommendation2025-07-16