TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Communication-Efficient Graph Neural Networks with Probabi...

Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching

Tim Kaler, Alexandros-Stavros Iliopoulos, Philip Murzynowski, Tao B. Schardl, Charles E. Leiserson, Jie Chen

2023-05-04Recommendation Systems
PaperPDFCode(official)Code(official)

Abstract

Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs, owing to the widespread use and success of GNNs in applications such as recommendation systems and financial forensics. This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings, where the necessary partitioning of vertex features across distributed storage causes feature communication to become a major bottleneck that hampers scalability. To significantly reduce the communication volume without compromising prediction accuracy, we propose a policy for caching data associated with frequently accessed vertices in remote partitions. The proposed policy is based on an analysis of vertex-wise inclusion probabilities (VIP) during multi-hop neighborhood sampling, which may expand the neighborhood far beyond the partition boundaries of the graph. VIP analysis not only enables the elimination of the communication bottleneck, but it also offers a means to organize in-memory data by prioritizing GPU storage for the most frequently accessed vertex features. We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data and leverages the VIP-driven caching policy. SALIENT++ retains the local training efficiency and scalability of SALIENT by using a deep pipeline and drastically reducing communication volume while consuming only a fraction of the storage required by SALIENT. We provide experimental results with the Open Graph Benchmark data sets and demonstrate that training a 3-layer GraphSAGE model with SALIENT++ on 8 single-GPU machines is 7.1 faster than with SALIENT on 1 single-GPU machine, and 12.7 faster than with DistDGL on 8 single-GPU machines.

Related Papers

IP2: Entity-Guided Interest Probing for Personalized News Recommendation2025-07-18A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Looking for Fairness in Recommender Systems2025-07-16Journalism-Guided Agentic In-Context Learning for News Stance Detection2025-07-15LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing2025-07-12When Graph Contrastive Learning Backfires: Spectral Vulnerability and Defense in Recommendation2025-07-10