TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/JESTR: Joint Embedding Space Technique for Ranking Candida...

JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data

Apurva Kalia, Yan Zhou Chen, Dilip Krishnan, Soha Hassoun

2024-11-18Molecule retrieval from MS/MS spectrumMolecule retrieval from MS/MS spectrum (bonus chemical formulae)
PaperPDF

Abstract

Motivation: A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results: We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6%-71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model's ability to discern between target and candidate molecules. When comparing JESTR's performance against that of publicly available pretrained models of SIRIUS and CFM-ID on appropriate subsets of MassSpecGym benchmark dataset, JESTR outperforms these tools by 31% and 238%, respectively. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome.

Results

TaskDatasetMetricValueModel
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 115.62JESTR_NR
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 2060.55JESTR_NR
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 537.47JESTR_NR
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 115.13JESTR
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 2060.32JESTR
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 536.75JESTR
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 110.71ESP
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 2042.66ESP
Molecule retrieval from MS/MS spectrumMassSpecGymHit rate @ 524.84ESP
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 111.85JESTR
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 2061.46JESTR
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 532.95JESTR
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 111.82JESTR_NR
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 2061.46JESTR_NR
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 533.48JESTR_NR
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 111.05ESP
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 2052.2ESP
Molecule retrieval from MS/MS spectrum (bonus chemical formulae)MassSpecGymHit rate @ 527.42ESP

Related Papers

MassSpecGym: A benchmark for the discovery and identification of molecules2024-10-30