TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Text-Guided Molecule Generation with Diffusion Language Mo...

Text-Guided Molecule Generation with Diffusion Language Model

Haisong Gong, Qiang Liu, Shu Wu, Liang Wang

2024-02-20Drug DiscoveryText-based de novo Molecule GenerationLanguage Modelling
PaperPDFCode(official)

Abstract

Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.

Results

TaskDatasetMetricValueModel
Drug DiscoveryChEBI-20BLEU82.8TGM-DLM w/o corr
Drug DiscoveryChEBI-20Exact Match24.2TGM-DLM w/o corr
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.89TGM-DLM w/o corr
Drug DiscoveryChEBI-20Levenshtein16.897TGM-DLM w/o corr
Drug DiscoveryChEBI-20MACCS FTS87.4TGM-DLM w/o corr
Drug DiscoveryChEBI-20Morgan FTS72.2TGM-DLM w/o corr
Drug DiscoveryChEBI-20Parameter Count180000000TGM-DLM w/o corr
Drug DiscoveryChEBI-20RDK FTS77.1TGM-DLM w/o corr
Drug DiscoveryChEBI-20Text2Mol58.9TGM-DLM w/o corr
Drug DiscoveryChEBI-20Validity78.9TGM-DLM w/o corr
Drug DiscoveryChEBI-20BLEU82.6TGM-DLM
Drug DiscoveryChEBI-20Exact Match24.2TGM-DLM
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.77TGM-DLM
Drug DiscoveryChEBI-20Levenshtein17.003TGM-DLM
Drug DiscoveryChEBI-20MACCS FTS85.4TGM-DLM
Drug DiscoveryChEBI-20Morgan FTS68.8TGM-DLM
Drug DiscoveryChEBI-20Parameter Count180000000TGM-DLM
Drug DiscoveryChEBI-20RDK FTS73.9TGM-DLM
Drug DiscoveryChEBI-20Text2Mol58.1TGM-DLM
Drug DiscoveryChEBI-20Validity87.1TGM-DLM
Text-based de novo Molecule GenerationChEBI-20BLEU82.8TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Exact Match24.2TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.89TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Levenshtein16.897TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20MACCS FTS87.4TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Morgan FTS72.2TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Parameter Count180000000TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20RDK FTS77.1TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Text2Mol58.9TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20Validity78.9TGM-DLM w/o corr
Text-based de novo Molecule GenerationChEBI-20BLEU82.6TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Exact Match24.2TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.77TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Levenshtein17.003TGM-DLM
Text-based de novo Molecule GenerationChEBI-20MACCS FTS85.4TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Morgan FTS68.8TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Parameter Count180000000TGM-DLM
Text-based de novo Molecule GenerationChEBI-20RDK FTS73.9TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Text2Mol58.1TGM-DLM
Text-based de novo Molecule GenerationChEBI-20Validity87.1TGM-DLM

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16