MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra:
- 💥 De novo molecule generation (MS/MS spectrum → molecular structure)
- ✨ Bonus chemical formulae challenge (MS/MS spectrum + chemical formula → molecular structure)
- 💥 Molecule retrieval (MS/MS spectrum → ranked list of candidate molecular structures)
- ✨ Bonus chemical formulae challenge (MS/MS spectrum → ranked list of candidate molecular structures with ground-truth chemical formulae)
- 💥 Spectrum simulation (molecular structure → MS/MS spectrum)
- ✨ Bonus chemical formulae challenge (molecular structure → MS/MS spectrum; evaluated on the retrieval of molecular structures with ground-truth chemical formulae)
The provided challenges abstract the process of scientific discovery from biological and environmental samples into well-defined machine learning problems with pre-defined datasets, data splits, and evaluation metrics.