TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MADGEN: Mass-Spec attends to De Novo Molecular generation

MADGEN: Mass-Spec attends to De Novo Molecular generation

Yinkai Wang, Xiaohui Chen, LiPing Liu, Soha Hassoun

2025-01-03De novo molecule generation from MS/MS spectrumDe novo molecule generation from MS/MS spectrum (bonus chemical formulae)Contrastive LearningRetrieval
PaperPDFCode(official)

Abstract

The annotation (assigning structural chemical identities) of MS/MS spectra remains a significant challenge due to the enormous molecular diversity in biological samples and the limited scope of reference databases. Currently, the vast majority of spectral measurements remain in the "dark chemical space" without structural annotations. To improve annotation, we propose MADGEN (Mass-spec Attends to De Novo Molecular GENeration), a scaffold-based method for de novo molecular structure generation guided by mass spectrometry data. MADGEN operates in two stages: scaffold retrieval and spectra-conditioned molecular generation starting with the scaffold. In the first stage, given an MS/MS spectrum, we formulate scaffold retrieval as a ranking problem and employ contrastive learning to align mass spectra with candidate molecular scaffolds. In the second stage, starting from the retrieved scaffold, we employ the MS/MS spectrum to guide an attention-based generative model to generate the final molecule. Our approach constrains the molecular generation search space, reducing its complexity and improving generation accuracy. We evaluate MADGEN on three datasets (NIST23, CANOPUS, and MassSpecGym) and evaluate MADGEN's performance with a predictive scaffold retriever and with an oracle retriever. We demonstrate the effectiveness of using attention to integrate spectral information throughout the generation process to achieve strong results with the oracle retriever.

Results

TaskDatasetMetricValueModel
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 Accuracy1.31Madgen
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 MCES27.47Madgen
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 Tanimoto0.2Madgen
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 Accuracy1.54Madgen
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 MCES16.84Madgen
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 Tanimoto0.26Madgen
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 MCES45.89Spec2Mol
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-1 Tanimoto0.19Spec2Mol
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 MCES32.6Spec2Mol
De novo molecule generation from MS/MS spectrum (bonus chemical formulae)MassSpecGymTop-10 Tanimoto0.28Spec2Mol

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16