TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/G2GT: Retrosynthesis Prediction with Graph to Graph Attent...

G2GT: Retrosynthesis Prediction with Graph to Graph Attention Neural Network and Self-Training

Zaiyun Lin, Shiqiu Yin, Lei Shi, Wenbiao Zhou, YingSheng Zhang

2022-04-19Ensemble LearningData AugmentationRetrosynthesisSingle-step retrosynthesisGraph Attention
PaperPDFCode

Abstract

Retrosynthesis prediction is one of the fundamental challenges in organic chemistry and related fields. The goal is to find reactants molecules that can synthesize product molecules. To solve this task, we propose a new graph-to-graph transformation model, G2GT, in which the graph encoder and graph decoder are built upon the standard transformer structure. We also show that self-training, a powerful data augmentation method that utilizes unlabeled molecule data, can significantly improve the model's performance. Inspired by the reaction type label and ensemble learning, we proposed a novel weak ensemble method to enhance diversity. We combined beam search, nucleus, and top-k sampling methods to further improve inference diversity and proposed a simple ranking algorithm to retrieve the final top-10 results. We achieved new state-of-the-art results on both the USPTO-50K dataset, with top1 accuracy of 54%, and the larger data set USPTO-full, with top1 accuracy of 50%, and competitive top-10 results.

Results

TaskDatasetMetricValueModel
Single-step retrosynthesisUSPTO-50kTop-1 accuracy54.1G2GT (reaction class unknown)
Single-step retrosynthesisUSPTO-50kTop-10 accuracy77.7G2GT (reaction class unknown)
Single-step retrosynthesisUSPTO-50kTop-3 accuracy69.9G2GT (reaction class unknown)
Single-step retrosynthesisUSPTO-50kTop-5 accuracy74.5G2GT (reaction class unknown)

Related Papers

Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Catching Bid-rigging Cartels with Graph Attention Neural Networks2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting2025-07-14