High Performance of Gradient Boosting in Binding Affinity Prediction

Dmitrii Gavrilev, Nurlybek Amangeldiuly, Sergei Ivanov, Evgeny Burnaev

2022-05-14Vocal Bursts Intensity Prediction Drug Discovery

Abstract

Prediction of protein-ligand (PL) binding affinity remains the key to drug discovery. Popular approaches in recent years involve graph neural networks (GNNs), which are used to learn the topology and geometry of PL complexes. However, GNNs are computationally heavy and have poor scalability to graph sizes. On the other hand, traditional machine learning (ML) approaches, such as gradient-boosted decision trees (GBDTs), are lightweight yet extremely efficient for tabular data. We propose to use PL interaction features along with PL graph-level features in GBDT. We show that this combination outperforms the existing solutions.

Results

Task	Dataset	Metric	Value	Model
Protein-Ligand Affinity Prediction	PDBbind	RMSE	1.316	LightGBM
Protein-Ligand Affinity Prediction	CSAR-HiQ	RMSE	1.725	LightGBM

Related Papers

Assay2Mol: large language model-based drug design using BioAssay context2025-07-16 A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction2025-07-15 Graph Learning2025-07-08 Exploring Modularity of Agentic Systems for Drug Discovery2025-06-27 Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design2025-06-26 Large Language Model Agent for Modular Task Execution in Drug Discovery2025-06-26 PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning2025-06-24 A standard transformer and attention with linear biases for molecular conformer generation2025-06-24