TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Navigating the Design Space of Equivariant Diffusion-Based...

Navigating the Design Space of Equivariant Diffusion-Based Generative Models for De Novo 3D Molecule Generation

Tuan Le, Julian Cremer, Frank Noé, Djork-Arné Clevert, Kristof Schütt

2023-09-29Drug DiscoveryUnconditional Molecule Generation
PaperPDF

Abstract

Deep generative diffusion models are a promising avenue for 3D de novo molecular design in materials science and drug discovery. However, their utility is still limited by suboptimal performance on large molecular structures and limited training data. To address this gap, we explore the design space of E(3)-equivariant diffusion models, focusing on previously unexplored areas. Our extensive comparative analysis evaluates the interplay between continuous and discrete state spaces. From this investigation, we present the EQGAT-diff model, which consistently outperforms established models for the QM9 and GEOM-Drugs datasets. Significantly, EQGAT-diff takes continuous atom positions, while chemical elements and bond types are categorical and uses time-dependent loss weighting, substantially increasing training convergence, the quality of generated samples, and inference time. We also showcase that including chemically motivated additional features like hybridization states in the diffusion process enhances the validity of generated molecules. To further strengthen the applicability of diffusion models to limited training data, we investigate the transferability of EQGAT-diff trained on the large PubChem3D dataset with implicit hydrogen atoms to target different data distributions. Fine-tuning EQGAT-diff for just a few iterations shows an efficient distribution shift, further improving performance throughout data sets. Finally, we test our model on the Crossdocked data set for structure-based de novo ligand generation, underlining the importance of our findings showing state-of-the-art performance on Vina docking scores.

Results

TaskDatasetMetricValueModel
Unconditional Molecule GenerationGEOM-DRUGSPoseBusters Atoms Connected84.4EQGAT-diff
Unconditional Molecule GenerationGEOM-DRUGSPoseBusters Validity59.7EQGAT-diff
Unconditional Molecule GenerationGEOM-DRUGSValidity94.6EQGAT-diff

Related Papers

Assay2Mol: large language model-based drug design using BioAssay context2025-07-16A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction2025-07-15Graph Learning2025-07-08Exploring Modularity of Agentic Systems for Drug Discovery2025-06-27Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design2025-06-26Large Language Model Agent for Modular Task Execution in Drug Discovery2025-06-26PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning2025-06-24A standard transformer and attention with linear biases for molecular conformer generation2025-06-24