TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GraphVite: A High-Performance CPU-GPU Hybrid System for No...

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Zhaocheng Zhu, Shizhen Xu, Meng Qu, Jian Tang

2019-03-02Knowledge Graph EmbeddingDimensionality ReductionVocal Bursts Intensity PredictionNetwork EmbeddingNode ClassificationLink Prediction
PaperPDFCode

Abstract

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.

Results

TaskDatasetMetricValueModel
Link Prediction FB15kHits@10.721SimplE
Link Prediction FB15kHits@100.876SimplE
Link Prediction FB15kHits@30.818SimplE
Link Prediction FB15kMR74SimplE
Link Prediction FB15kMRR0.779SimplE
Link Prediction FB15ktraining time (s)2105SimplE
Link PredictionWN18Hits@100.954SimplE
Link PredictionWN18Hits@30.95SimplE
Link PredictionWN18MR412SimplE
Link PredictionWN18MRR0.948SimplE
Link PredictionWN18training time (s)1042SimplE
Link PredictionFB15k-237Hits@10.217RotatE
Link PredictionFB15k-237Hits@100.511RotatE
Link PredictionFB15k-237Hits@30.347RotatE
Link PredictionFB15k-237MR176RotatE
Link PredictionFB15k-237MRR0.314RotatE
Link PredictionFB15k-237training time (s)857RotatE
Node ClassificationYouTubeMacro-F1@2%33.69LINE
Node ClassificationYouTubeMicro-F1@2%40.61LINE
Node ClassificationYouTuberuntime (s)70.09LINE

Related Papers

SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs2025-07-17Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning2025-07-14Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations2025-07-08Topic Modeling and Link-Prediction for Material Property Discovery2025-07-08Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs2025-07-05Understanding Generalization in Node and Link Prediction2025-07-01Context-Driven Knowledge Graph Completion with Semantic-Aware Relational Message Passing2025-06-29Active Learning for Manifold Gaussian Process Regression2025-06-26