TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DiffMS: Diffusion Generation of Molecules Conditioned on M...

DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra

Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang, Shuiwang Ji, Connor W. Coley

2025-02-13De novo molecule generation from MS/MS spectrum (bonus chemical formulae)
PaperPDFCode(official)

Abstract

Mass spectrometry plays a fundamental role in elucidating the structures of unknown molecules and subsequent scientific discoveries. One formulation of the structure elucidation task is the conditional de novo generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. The encoder utilizes a transformer architecture and models mass spectra domain knowledge such as peak formulae and neutral losses, and the decoder is a discrete graph diffusion model restricted by the heavy-atom composition of a known chemical formula. To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs, which are available in virtually infinite quantities, compared to structure-spectrum pairs that number in the tens of thousands. Extensive experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation. We provide several ablations to demonstrate the effectiveness of our diffusion and pretraining approaches and show consistent performance scaling with increasing pretraining dataset size. DiffMS code is publicly available at https://github.com/coleygroup/DiffMS.

Results

TaskDatasetMetricValueModel
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 Accuracy2.3DiffMS
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 MCES18.45DiffMS
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 Tanimoto0.28DiffMS
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 Accuracy4.25DiffMS
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 MCES14.73DiffMS
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 Tanimoto0.39DiffMS
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 MCES37.76Spec2Mol
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 Tanimoto0.12Spec2Mol
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 MCES29.4Spec2Mol
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 Tanimoto0.16Spec2Mol

Related Papers

MADGEN: Mass-Spec attends to De Novo Molecular generation2025-01-03MassSpecGym: A benchmark for the discovery and identification of molecules2024-10-30