TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MoleculeNet: A Benchmark for Molecular Machine Learning

MoleculeNet: A Benchmark for Molecular Machine Learning

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande

2017-03-02Molecular Property Predictionimbalanced classificationBIG-bench Machine Learning
PaperPDFCodeCodeCodeCodeCode

Abstract

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

Results

TaskDatasetMetricValueModel
Document Text ClassificationPersian TwitterAverage Recall0.99XGBoost
Molecular Property PredictionESOLRMSE0.58MPNN
Molecular Property PredictionESOLRMSE0.99XGBoost
Atomistic DescriptionESOLRMSE0.58MPNN
Atomistic DescriptionESOLRMSE0.99XGBoost

Related Papers

Acquiring and Adapting Priors for Novel Tasks via Neural Meta-Architectures2025-07-07Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model2025-07-05TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence2025-06-26LSH-DynED: A Dynamic Ensemble Framework with LSH-Based Undersampling for Evolving Multi-Class Imbalanced Classification2025-06-24Descriptor-based Foundation Models for Molecular Property Prediction2025-06-18CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction2025-06-18Robust Molecular Property Prediction via Densifying Scarce Labeled Data2025-06-13BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models2025-06-10