TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Therapeutics Data Commons: Machine Learning Datasets and T...

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik

2021-02-18Molecular Property PredictionDrug DiscoveryBIG-bench Machine LearningTDC ADMET Benchmarking Group
PaperPDFCodeCode(official)

Abstract

Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at https://tdcommons.ai.

Results

TaskDatasetMetricValueModel
Drug DiscoverytdcommonsTDC.AMES0.823MLP-RDKit2D
Drug DiscoverytdcommonsTDC.BBB_Martins0.889MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Bioavailability_Ma0.672MLP-RDKit2D
Drug DiscoverytdcommonsTDC.CYP2C9_Inhibition_Veith0.742MLP-RDKit2D
Drug DiscoverytdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.36MLP-RDKit2D
Drug DiscoverytdcommonsTDC.CYP2D6_Inhibition_Veith0.616MLP-RDKit2D
Drug DiscoverytdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.677MLP-RDKit2D
Drug DiscoverytdcommonsTDC.CYP3A4_Inhibition_Veith0.829MLP-RDKit2D
Drug DiscoverytdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.639MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Caco2_Wang0.393MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Clearance_Hepatocyte_AZ0.382MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Clearance_Microsome_AZ0.586MLP-RDKit2D
Drug DiscoverytdcommonsTDC.DILI0.875MLP-RDKit2D
Drug DiscoverytdcommonsTDC.HIA_Hou0.972MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Half_Life_Obach0.184MLP-RDKit2D
Drug DiscoverytdcommonsTDC.LD50_Zhu0.678MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Lipophilicity_AstraZeneca0.574MLP-RDKit2D
Drug DiscoverytdcommonsTDC.PPBR_AZ9.994MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Pgp_Broccatelli0.918MLP-RDKit2D
Drug DiscoverytdcommonsTDC.Solubility_AqSolDB0.827MLP-RDKit2D
Drug DiscoverytdcommonsTDC.VDss_Lombardo0.561MLP-RDKit2D
Drug DiscoverytdcommonsTDC.hERG0.841MLP-RDKit2D
Drug DiscoverytdcommonsTDC.AMES0.814AttentiveFP
Drug DiscoverytdcommonsTDC.BBB_Martins0.855AttentiveFP
Drug DiscoverytdcommonsTDC.Bioavailability_Ma0.632AttentiveFP
Drug DiscoverytdcommonsTDC.CYP2C9_Inhibition_Veith0.749AttentiveFP
Drug DiscoverytdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.375AttentiveFP
Drug DiscoverytdcommonsTDC.CYP2D6_Inhibition_Veith0.646AttentiveFP
Drug DiscoverytdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.574AttentiveFP
Drug DiscoverytdcommonsTDC.CYP3A4_Inhibition_Veith0.851AttentiveFP
Drug DiscoverytdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.576AttentiveFP
Drug DiscoverytdcommonsTDC.Caco2_Wang0.401AttentiveFP
Drug DiscoverytdcommonsTDC.Clearance_Hepatocyte_AZ0.289AttentiveFP
Drug DiscoverytdcommonsTDC.Clearance_Microsome_AZ0.365AttentiveFP
Drug DiscoverytdcommonsTDC.DILI0.886AttentiveFP
Drug DiscoverytdcommonsTDC.HIA_Hou0.974AttentiveFP
Drug DiscoverytdcommonsTDC.Half_Life_Obach0.085AttentiveFP
Drug DiscoverytdcommonsTDC.LD50_Zhu0.678AttentiveFP
Drug DiscoverytdcommonsTDC.Lipophilicity_AstraZeneca0.572AttentiveFP
Drug DiscoverytdcommonsTDC.PPBR_AZ9.373AttentiveFP
Drug DiscoverytdcommonsTDC.Pgp_Broccatelli0.892AttentiveFP
Drug DiscoverytdcommonsTDC.Solubility_AqSolDB0.776AttentiveFP
Drug DiscoverytdcommonsTDC.VDss_Lombardo0.241AttentiveFP
Drug DiscoverytdcommonsTDC.hERG0.825AttentiveFP
Drug DiscoverytdcommonsTDC.AMES0.842AttrMasking
Drug DiscoverytdcommonsTDC.BBB_Martins0.892AttrMasking
Drug DiscoverytdcommonsTDC.Bioavailability_Ma0.577AttrMasking
Drug DiscoverytdcommonsTDC.CYP2C9_Inhibition_Veith0.829AttrMasking
Drug DiscoverytdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.381AttrMasking
Drug DiscoverytdcommonsTDC.CYP2D6_Inhibition_Veith0.721AttrMasking
Drug DiscoverytdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.704AttrMasking
Drug DiscoverytdcommonsTDC.CYP3A4_Inhibition_Veith0.902AttrMasking
Drug DiscoverytdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.582AttrMasking
Drug DiscoverytdcommonsTDC.Caco2_Wang0.546AttrMasking
Drug DiscoverytdcommonsTDC.Clearance_Hepatocyte_AZ0.413AttrMasking
Drug DiscoverytdcommonsTDC.Clearance_Microsome_AZ0.585AttrMasking
Drug DiscoverytdcommonsTDC.DILI0.919AttrMasking
Drug DiscoverytdcommonsTDC.HIA_Hou0.978AttrMasking
Drug DiscoverytdcommonsTDC.Half_Life_Obach0.151AttrMasking
Drug DiscoverytdcommonsTDC.LD50_Zhu0.685AttrMasking
Drug DiscoverytdcommonsTDC.Lipophilicity_AstraZeneca0.547AttrMasking
Drug DiscoverytdcommonsTDC.PPBR_AZ10.075AttrMasking
Drug DiscoverytdcommonsTDC.Pgp_Broccatelli0.929AttrMasking
Drug DiscoverytdcommonsTDC.Solubility_AqSolDB1.026AttrMasking
Drug DiscoverytdcommonsTDC.VDss_Lombardo0.559AttrMasking
Drug DiscoverytdcommonsTDC.hERG0.778AttrMasking
Drug DiscoverytdcommonsTDC.AMES0.818GCN
Drug DiscoverytdcommonsTDC.BBB_Martins0.842GCN
Drug DiscoverytdcommonsTDC.Bioavailability_Ma0.566GCN
Drug DiscoverytdcommonsTDC.CYP2C9_Inhibition_Veith0.735GCN
Drug DiscoverytdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.344GCN
Drug DiscoverytdcommonsTDC.CYP2D6_Inhibition_Veith0.616GCN
Drug DiscoverytdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.617GCN
Drug DiscoverytdcommonsTDC.CYP3A4_Inhibition_Veith0.84GCN
Drug DiscoverytdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.59GCN
Drug DiscoverytdcommonsTDC.Caco2_Wang0.599GCN
Drug DiscoverytdcommonsTDC.Clearance_Hepatocyte_AZ0.366GCN
Drug DiscoverytdcommonsTDC.Clearance_Microsome_AZ0.532GCN
Drug DiscoverytdcommonsTDC.DILI0.859GCN
Drug DiscoverytdcommonsTDC.HIA_Hou0.936GCN
Drug DiscoverytdcommonsTDC.Half_Life_Obach0.239GCN
Drug DiscoverytdcommonsTDC.LD50_Zhu0.649GCN
Drug DiscoverytdcommonsTDC.Lipophilicity_AstraZeneca0.541GCN
Drug DiscoverytdcommonsTDC.PPBR_AZ10.194GCN
Drug DiscoverytdcommonsTDC.Pgp_Broccatelli0.895GCN
Drug DiscoverytdcommonsTDC.Solubility_AqSolDB0.907GCN
Drug DiscoverytdcommonsTDC.VDss_Lombardo0.457GCN
Drug DiscoverytdcommonsTDC.hERG0.738GCN
Molecular Property PredictionBBBPROC-AUC89.2AttrMasking
Molecular Property PredictionBBBPROC-AUC85.5AttentiveFP
Atomistic DescriptionBBBPROC-AUC89.2AttrMasking
Atomistic DescriptionBBBPROC-AUC85.5AttentiveFP
Therapeutics Data CommonstdcommonsTDC.AMES0.823MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.BBB_Martins0.889MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Bioavailability_Ma0.672MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Inhibition_Veith0.742MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.36MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Inhibition_Veith0.616MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.677MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Inhibition_Veith0.829MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.639MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Caco2_Wang0.393MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Clearance_Hepatocyte_AZ0.382MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Clearance_Microsome_AZ0.586MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.DILI0.875MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.HIA_Hou0.972MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Half_Life_Obach0.184MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.LD50_Zhu0.678MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Lipophilicity_AstraZeneca0.574MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.PPBR_AZ9.994MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Pgp_Broccatelli0.918MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.Solubility_AqSolDB0.827MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.VDss_Lombardo0.561MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.hERG0.841MLP-RDKit2D
Therapeutics Data CommonstdcommonsTDC.AMES0.814AttentiveFP
Therapeutics Data CommonstdcommonsTDC.BBB_Martins0.855AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Bioavailability_Ma0.632AttentiveFP
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Inhibition_Veith0.749AttentiveFP
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.375AttentiveFP
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Inhibition_Veith0.646AttentiveFP
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.574AttentiveFP
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Inhibition_Veith0.851AttentiveFP
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.576AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Caco2_Wang0.401AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Clearance_Hepatocyte_AZ0.289AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Clearance_Microsome_AZ0.365AttentiveFP
Therapeutics Data CommonstdcommonsTDC.DILI0.886AttentiveFP
Therapeutics Data CommonstdcommonsTDC.HIA_Hou0.974AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Half_Life_Obach0.085AttentiveFP
Therapeutics Data CommonstdcommonsTDC.LD50_Zhu0.678AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Lipophilicity_AstraZeneca0.572AttentiveFP
Therapeutics Data CommonstdcommonsTDC.PPBR_AZ9.373AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Pgp_Broccatelli0.892AttentiveFP
Therapeutics Data CommonstdcommonsTDC.Solubility_AqSolDB0.776AttentiveFP
Therapeutics Data CommonstdcommonsTDC.VDss_Lombardo0.241AttentiveFP
Therapeutics Data CommonstdcommonsTDC.hERG0.825AttentiveFP
Therapeutics Data CommonstdcommonsTDC.AMES0.842AttrMasking
Therapeutics Data CommonstdcommonsTDC.BBB_Martins0.892AttrMasking
Therapeutics Data CommonstdcommonsTDC.Bioavailability_Ma0.577AttrMasking
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Inhibition_Veith0.829AttrMasking
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.381AttrMasking
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Inhibition_Veith0.721AttrMasking
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.704AttrMasking
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Inhibition_Veith0.902AttrMasking
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.582AttrMasking
Therapeutics Data CommonstdcommonsTDC.Caco2_Wang0.546AttrMasking
Therapeutics Data CommonstdcommonsTDC.Clearance_Hepatocyte_AZ0.413AttrMasking
Therapeutics Data CommonstdcommonsTDC.Clearance_Microsome_AZ0.585AttrMasking
Therapeutics Data CommonstdcommonsTDC.DILI0.919AttrMasking
Therapeutics Data CommonstdcommonsTDC.HIA_Hou0.978AttrMasking
Therapeutics Data CommonstdcommonsTDC.Half_Life_Obach0.151AttrMasking
Therapeutics Data CommonstdcommonsTDC.LD50_Zhu0.685AttrMasking
Therapeutics Data CommonstdcommonsTDC.Lipophilicity_AstraZeneca0.547AttrMasking
Therapeutics Data CommonstdcommonsTDC.PPBR_AZ10.075AttrMasking
Therapeutics Data CommonstdcommonsTDC.Pgp_Broccatelli0.929AttrMasking
Therapeutics Data CommonstdcommonsTDC.Solubility_AqSolDB1.026AttrMasking
Therapeutics Data CommonstdcommonsTDC.VDss_Lombardo0.559AttrMasking
Therapeutics Data CommonstdcommonsTDC.hERG0.778AttrMasking
Therapeutics Data CommonstdcommonsTDC.AMES0.818GCN
Therapeutics Data CommonstdcommonsTDC.BBB_Martins0.842GCN
Therapeutics Data CommonstdcommonsTDC.Bioavailability_Ma0.566GCN
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Inhibition_Veith0.735GCN
Therapeutics Data CommonstdcommonsTDC.CYP2C9_Substrate_CarbonMangels0.344GCN
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Inhibition_Veith0.616GCN
Therapeutics Data CommonstdcommonsTDC.CYP2D6_Substrate_CarbonMangels0.617GCN
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Inhibition_Veith0.84GCN
Therapeutics Data CommonstdcommonsTDC.CYP3A4_Substrate_CarbonMangels0.59GCN
Therapeutics Data CommonstdcommonsTDC.Caco2_Wang0.599GCN
Therapeutics Data CommonstdcommonsTDC.Clearance_Hepatocyte_AZ0.366GCN
Therapeutics Data CommonstdcommonsTDC.Clearance_Microsome_AZ0.532GCN
Therapeutics Data CommonstdcommonsTDC.DILI0.859GCN
Therapeutics Data CommonstdcommonsTDC.HIA_Hou0.936GCN
Therapeutics Data CommonstdcommonsTDC.Half_Life_Obach0.239GCN
Therapeutics Data CommonstdcommonsTDC.LD50_Zhu0.649GCN
Therapeutics Data CommonstdcommonsTDC.Lipophilicity_AstraZeneca0.541GCN
Therapeutics Data CommonstdcommonsTDC.PPBR_AZ10.194GCN
Therapeutics Data CommonstdcommonsTDC.Pgp_Broccatelli0.895GCN
Therapeutics Data CommonstdcommonsTDC.Solubility_AqSolDB0.907GCN
Therapeutics Data CommonstdcommonsTDC.VDss_Lombardo0.457GCN
Therapeutics Data CommonstdcommonsTDC.hERG0.738GCN

Related Papers

Assay2Mol: large language model-based drug design using BioAssay context2025-07-16A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction2025-07-15Graph Learning2025-07-08Acquiring and Adapting Priors for Novel Tasks via Neural Meta-Architectures2025-07-07Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model2025-07-05Exploring Modularity of Agentic Systems for Drug Discovery2025-06-27TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence2025-06-26Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design2025-06-26