GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Zhaocheng Zhu, Shizhen Xu, Meng Qu, Jian Tang

2019-03-02Knowledge Graph Embedding Dimensionality Reduction Vocal Bursts Intensity Prediction Network Embedding Node Classification Link Prediction

Paper PDF Code

Abstract

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.

Results

Task	Dataset	Metric	Value	Model
Link Prediction	FB15k	Hits@1	0.721	SimplE
Link Prediction	FB15k	Hits@10	0.876	SimplE
Link Prediction	FB15k	Hits@3	0.818	SimplE
Link Prediction	FB15k	MR	74	SimplE
Link Prediction	FB15k	MRR	0.779	SimplE
Link Prediction	FB15k	training time (s)	2105	SimplE
Link Prediction	WN18	Hits@10	0.954	SimplE
Link Prediction	WN18	Hits@3	0.95	SimplE
Link Prediction	WN18	MR	412	SimplE
Link Prediction	WN18	MRR	0.948	SimplE
Link Prediction	WN18	training time (s)	1042	SimplE
Link Prediction	FB15k-237	Hits@1	0.217	RotatE
Link Prediction	FB15k-237	Hits@10	0.511	RotatE
Link Prediction	FB15k-237	Hits@3	0.347	RotatE
Link Prediction	FB15k-237	MR	176	RotatE
Link Prediction	FB15k-237	MRR	0.314	RotatE
Link Prediction	FB15k-237	training time (s)	857	RotatE
Node Classification	YouTube	Macro-F1@2%	33.69	LINE
Node Classification	YouTube	Micro-F1@2%	40.61	LINE
Node Classification	YouTube	runtime (s)	70.09	LINE

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Abstract

Results

Related Papers

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Abstract

Results

Related Papers