TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LDMol: Text-to-Molecule Diffusion Model with Structurally ...

LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space

Jinho Chang, Jong Chul Ye

2024-05-28Text RetrievalText-based de novo Molecule GenerationContrastive Learning
PaperPDFCode(official)

Abstract

With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space, and a natural language-conditioned latent diffusion model. In particular, recognizing that multiple SMILES notations can represent the same molecule, we employ a contrastive learning strategy to extract feature space that is aware of the unique characteristics of the molecule structure. LDMol outperforms the existing baselines on the text-to-molecule generation benchmark, suggesting a potential for diffusion models can outperform autoregressive models in text data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.

Results

TaskDatasetMetricValueModel
Drug DiscoveryChEBI-20BLEU92.6LDMol
Drug DiscoveryChEBI-20Exact Match53.3LDMol
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.2LDMol
Drug DiscoveryChEBI-20Levenshtein6.75LDMol
Drug DiscoveryChEBI-20MACCS FTS97.3LDMol
Drug DiscoveryChEBI-20Morgan FTS93.1LDMol
Drug DiscoveryChEBI-20RDK FTS95LDMol
Drug DiscoveryChEBI-20Validity94.1LDMol
Text-based de novo Molecule GenerationChEBI-20BLEU92.6LDMol
Text-based de novo Molecule GenerationChEBI-20Exact Match53.3LDMol
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.2LDMol
Text-based de novo Molecule GenerationChEBI-20Levenshtein6.75LDMol
Text-based de novo Molecule GenerationChEBI-20MACCS FTS97.3LDMol
Text-based de novo Molecule GenerationChEBI-20Morgan FTS93.1LDMol
Text-based de novo Molecule GenerationChEBI-20RDK FTS95LDMol
Text-based de novo Molecule GenerationChEBI-20Validity94.1LDMol

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation2025-07-15Latent Space Consistency for Sparse-View CT Reconstruction2025-07-15Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding2025-07-13