TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/InstructMol: Multi-Modal Integration for Building a Versat...

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

He Cao, Zijing Liu, Xingyu Lu, Yuan YAO, Yu Li

2023-11-27Drug DiscoveryMolecule Captioning
PaperPDFCode(official)

Abstract

The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, a multi-modal LLM, effectively aligns molecular structures with natural language via an instruction-tuning approach, utilizing a two-stage training strategy that adeptly combines limited domain-specific data with molecular and textual information. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks, surpassing leading LLMs and significantly reducing the gap with specialized models, thereby establishing a robust foundation for a versatile and dependable drug discovery assistant.

Results

TaskDatasetMetricValueModel
Molecule CaptioningChEBI-20BLEU-247.5InstructMol-GS
Molecule CaptioningChEBI-20BLEU-437.1InstructMol-GS
Molecule CaptioningChEBI-20METEOR50.9InstructMol-GS
Molecule CaptioningChEBI-20ROUGE-156.6InstructMol-GS
Molecule CaptioningChEBI-20ROUGE-239.4InstructMol-GS
Molecule CaptioningChEBI-20ROUGE-L50.2InstructMol-GS
Molecule CaptioningChEBI-20BLEU-246.6InstructMol-G
Molecule CaptioningChEBI-20BLEU-436.5InstructMol-G
Molecule CaptioningChEBI-20METEOR49.1InstructMol-G
Molecule CaptioningChEBI-20ROUGE-154.7InstructMol-G
Molecule CaptioningChEBI-20ROUGE-236.5InstructMol-G
Molecule CaptioningChEBI-20ROUGE-L47.9InstructMol-G

Related Papers

Assay2Mol: large language model-based drug design using BioAssay context2025-07-16A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction2025-07-15Graph Learning2025-07-08Exploring Modularity of Agentic Systems for Drug Discovery2025-06-27Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design2025-06-26Large Language Model Agent for Modular Task Execution in Drug Discovery2025-06-26PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning2025-06-24A standard transformer and attention with linear biases for molecular conformer generation2025-06-24