TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CHILI: Chemically-Informed Large-scale Inorganic Nanomater...

CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning

Ulrik Friis-Jensen, Frederik L. Johansen, Andy S. Anker, Erik B. Dam, Kirsten M. Ø. Jensen, Raghavendra Selvan

2024-02-20X-ray PDF regressionCrystal system classificationBenchmarkingXRD regressionNeutron PDF regressionSpace group classificationSANS regressionSAXS regressionDistance regressionND regressionPosition regressionAtomic number classification
PaperPDFCode(official)

Abstract

Advances in graph machine learning (ML) have been driven by applications in chemistry as graphs have remained the most expressive representations of molecules. While early graph ML methods focused primarily on small organic molecules, recently, the scope of graph ML has expanded to include inorganic materials. Modelling the periodicity and symmetry of inorganic crystalline materials poses unique challenges, which existing graph ML methods are unable to address. Moving to inorganic nanomaterials increases complexity as the scale of number of nodes within each graph can be broad ($10$ to $10^5$). The bulk of existing graph ML focuses on characterising molecules and materials by predicting target properties with graphs as input. However, the most exciting applications of graph ML will be in their generative capabilities, which is currently not at par with other domains such as images or text. We invite the graph ML community to address these open challenges by presenting two new chemically-informed large-scale inorganic (CHILI) nanomaterials datasets: A medium-scale dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from 12 selected crystal types (CHILI-3K) and a large-scale dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined crystal structures (CHILI-100K). We define 11 property prediction tasks and 6 structure prediction tasks, which are of special interest for nanomaterial research. We benchmark the performance of a wide array of baseline methods and use these benchmarking results to highlight areas which need future work. To the best of our knowledge, CHILI-3K and CHILI-100K are the first open-source nanomaterial datasets of this scale -- both on the individual graph level and of the dataset as a whole -- and the only nanomaterials datasets with high structural and elemental diversity.

Results

TaskDatasetMetricValueModel
Graph ClassificationCHILI-3KF1-score (Weighted)0.108Most Frequent Class
Graph ClassificationCHILI-100KF1-score (Weighted)0.01Most Frequent Class
Graph ClassificationCHILI-3KF1-score (Weighted)0.44Most Frequent Class
Graph ClassificationCHILI-100KF1-score (Weighted)0.046Most Frequent Class
Node ClassificationCHILI-100KF1-score (Weighted)0.192Most Frequent Class
Node ClassificationCHILI-3KF1-score (Weighted)0.461Most Frequent Class
Graph Property PredictionCHILI-100KMSE 0.038Mean
Graph Property PredictionCHILI-3KMSE 0.037Mean
Graph Property PredictionCHILI-3KMSE 0.017Mean
Graph Property PredictionCHILI-100KMSE 0.021Mean
Graph Property PredictionCHILI-3KMSE 0.008Mean
Graph Property PredictionCHILI-100KMSE 0.007Mean
ClassificationCHILI-3KF1-score (Weighted)0.108Most Frequent Class
ClassificationCHILI-100KF1-score (Weighted)0.01Most Frequent Class
ClassificationCHILI-3KF1-score (Weighted)0.44Most Frequent Class
ClassificationCHILI-100KF1-score (Weighted)0.046Most Frequent Class
Node Property PredictionCHILI-3KPositional MAE16.575Mean
Node Property PredictionCHILI-100KPositional MAE16.336Mean
Distance regressionCHILI-3KMSE 0.265Mean
Distance regressionCHILI-100KMSE 0.307Mean
Atomic number classificationCHILI-100KF1-score (Weighted)0.192Most Frequent Class
Atomic number classificationCHILI-3KF1-score (Weighted)0.461Most Frequent Class

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15A Multi-View High-Resolution Foot-Ankle Complex Point Cloud Dataset During Gait for Occlusion-Robust 3D Completion2025-07-15FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning2025-07-15