William L. Hamilton, Rex Ying, Jure Leskovec
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Graph Regression | ZINC-500k | MAE | 0.398 | GraphSage |
| Graph Classification | CIFAR10 100k | Accuracy (%) | 66.08 | GraphSage |
| Node Classification | Accuracy | 38.9 | GraphSAGE (Hamilton et al., [2017a]) | |
| Node Classification | Brazil Air-Traffic | Accuracy | 0.404 | GraphSAGE (Hamilton et al., [2017a]) |
| Node Classification | PPI | F1 | 61.2 | GraphSAGE |
| Node Classification | Wiki-Vote | Accuracy | 24.5 | GraphSAGE (Hamilton et al., [2017a]) |
| Node Classification | CiteSeer with Public Split: fixed 20 nodes per class | Accuracy | 67.2 | GraphSAGE |
| Node Classification | Europe Air-Traffic | Accuracy | 27.2 | GraphSAGE (Hamilton et al., [2017a]) |
| Node Classification | Flickr | Accuracy | 0.641 | GraphSAGE (Hamilton et al., [2017a]) |
| Node Classification | USA Air-Traffic | Accuracy | 31.6 | GraphSAGE (Hamilton et al., [2017a]) |
| Node Classification | PATTERN 100k | Accuracy (%) | 50.516 | GraphSage |
| Link Property Prediction | ogbl-ddi | Number of params | 1421057 | GraphSAGE |
| Link Property Prediction | ogbl-citation2 | Number of params | 460289 | Full-batch GraphSAGE |
| Link Property Prediction | ogbl-citation2 | Number of params | 460289 | NeighborSampling (SAGE aggr) |
| Link Property Prediction | ogbl-collab | Number of params | 460289 | GraphSAGE (val as input) |
| Link Property Prediction | ogbl-collab | Number of params | 460289 | GraphSAGE |
| Link Property Prediction | ogbl-collab | Number of params | 460289 | GraphSAGE (val as input) |
| Link Property Prediction | ogbl-ppa | Number of params | 424449 | GraphSAGE |
| Classification | CIFAR10 100k | Accuracy (%) | 66.08 | GraphSage |
| Node Property Prediction | ogbn-arxiv | Number of params | 218664 | GraphSAGE |
| Node Property Prediction | ogbn-papers100M | Number of params | 5755172 | GraphSAGE_res_incep |
| Node Property Prediction | ogbn-products | Number of params | 103983 | GraphSAGE + C&S + node2vec |
| Node Property Prediction | ogbn-products | Number of params | 206895 | NeighborSampling (SAGE aggr) |
| Node Property Prediction | ogbn-products | Number of params | 206895 | Full-batch GraphSAGE |
| Node Property Prediction | ogbn-proteins | Number of params | 193136 | GraphSAGE |
| Node Property Prediction | ogbn-mag | Number of params | 154366772 | NeighborSampling (R-GCN aggr) |