TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Systematic Evaluation of Knowledge Graph Embeddings for ...

A Systematic Evaluation of Knowledge Graph Embeddings for Gene-Disease Association Prediction

Catarina Canastra, Cátia Pesquita

2025-04-11Knowledge GraphsKnowledge Graph EmbeddingsPredictionLink Prediction
PaperPDFCode(official)

Abstract

Discovery gene-disease links is important in biology and medicine areas, enabling disease identification and drug repurposing. Machine learning approaches accelerate this process by leveraging biological knowledge represented in ontologies and the structure of knowledge graphs. Still, many existing works overlook ontologies explicitly representing diseases, missing causal and semantic relationships between them. The gene-disease association problem naturally frames itself as a link prediction task, where embedding algorithms directly predict associations by exploring the structure and properties of the knowledge graph. Some works frame it as a node-pair classification task, combining embedding algorithms with traditional machine learning algorithms. This strategy aligns with the logic of a machine learning pipeline. However, the use of negative examples and the lack of validated gene-disease associations to train embedding models may constrain its effectiveness. This work introduces a novel framework for comparing the performance of link prediction versus node-pair classification tasks, analyses the performance of state of the art gene-disease association approaches, and compares the different order-based formalizations of gene-disease association prediction. It also evaluates the impact of the semantic richness through a disease-specific ontology and additional links between ontologies. The framework involves five steps: data splitting, knowledge graph integration, embedding, modeling and prediction, and method evaluation. Results show that enriching the semantic representation of diseases slightly improves performance, while additional links generate a greater impact. Link prediction methods better explore the semantic richness encoded in knowledge graphs. Although node-pair classification methods identify all true positives, link prediction methods outperform overall.

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11Foundation models for time series forecasting: Application in conformal prediction2025-07-09Topic Modeling and Link-Prediction for Material Property Discovery2025-07-08Predicting Graph Structure via Adapted Flux Balance Analysis2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08