TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/N-Gram Graph: Simple Unsupervised Representation for Graph...

N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Shengchao Liu, Mehmet Furkan Demirel, YIngyu Liang

2018-06-24NeurIPS 2019 12Molecular Property Prediction
PaperPDFCode(official)

Abstract

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. The method first embeds the vertices in the molecule graph. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training. The representations can thus be efficiently computed and then used with supervised learning methods for prediction. Experiments on 60 tasks from 10 benchmark datasets demonstrate its advantages over both popular graph neural networks and traditional representation methods. This is complemented by theoretical analysis showing its strong representation and prediction power.

Results

TaskDatasetMetricValueModel
Molecular Property PredictionFreeSolvRMSE2.688N-GramRF
Molecular Property PredictionFreeSolvRMSE5.061N-GramXGB
Molecular Property PredictionclintoxROC-AUC87.5N-GramXGB
Molecular Property PredictionclintoxROC-AUC77.5N-GramRF
Molecular Property PredictionLipophilicityRMSE0.812N-GramRF
Molecular Property PredictionLipophilicityRMSE2.072N-GramXGB
Molecular Property PredictionQM7MAE81.9N-GramXGB
Molecular Property PredictionQM7MAE92.8N-GramRF
Molecular Property PredictionBBBPROC-AUC69.7N-GramRF
Molecular Property PredictionBBBPROC-AUC69.1N-GramXGB
Molecular Property PredictionQM9MAE0.00964N-GramXGB
Molecular Property PredictionQM9MAE0.01037N-GramRF
Molecular Property PredictionQM8MAE0.0215N-GramXGB
Molecular Property PredictionQM8MAE0.0236N-GramRF
Molecular Property PredictionSIDERROC-AUC66.8N-GramRF
Molecular Property PredictionSIDERROC-AUC65.5N-GramXGB
Molecular Property PredictionTox21ROC-AUC75.8N-GramXGB
Molecular Property PredictionTox21ROC-AUC74.3N-GramRF
Molecular Property PredictionBACEROC-AUC79.1N-GramXGB
Molecular Property PredictionBACEROC-AUC77.9N-GramRF
Atomistic DescriptionFreeSolvRMSE2.688N-GramRF
Atomistic DescriptionFreeSolvRMSE5.061N-GramXGB
Atomistic DescriptionclintoxROC-AUC87.5N-GramXGB
Atomistic DescriptionclintoxROC-AUC77.5N-GramRF
Atomistic DescriptionLipophilicityRMSE0.812N-GramRF
Atomistic DescriptionLipophilicityRMSE2.072N-GramXGB
Atomistic DescriptionQM7MAE81.9N-GramXGB
Atomistic DescriptionQM7MAE92.8N-GramRF
Atomistic DescriptionBBBPROC-AUC69.7N-GramRF
Atomistic DescriptionBBBPROC-AUC69.1N-GramXGB
Atomistic DescriptionQM9MAE0.00964N-GramXGB
Atomistic DescriptionQM9MAE0.01037N-GramRF
Atomistic DescriptionQM8MAE0.0215N-GramXGB
Atomistic DescriptionQM8MAE0.0236N-GramRF
Atomistic DescriptionSIDERROC-AUC66.8N-GramRF
Atomistic DescriptionSIDERROC-AUC65.5N-GramXGB
Atomistic DescriptionTox21ROC-AUC75.8N-GramXGB
Atomistic DescriptionTox21ROC-AUC74.3N-GramRF
Atomistic DescriptionBACEROC-AUC79.1N-GramXGB
Atomistic DescriptionBACEROC-AUC77.9N-GramRF

Related Papers

Acquiring and Adapting Priors for Novel Tasks via Neural Meta-Architectures2025-07-07Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model2025-07-05TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence2025-06-26Descriptor-based Foundation Models for Molecular Property Prediction2025-06-18Robust Molecular Property Prediction via Densifying Scarce Labeled Data2025-06-13BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models2025-06-10The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning2025-06-09Graph Neural Networks in Modern AI-aided Drug Discovery2025-06-07