Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | J-HMBD Early Action | 10% | 58.1 | GAT |
| Temporal Action Localization | J-HMBD Early Action | 10% | 58.1 | GAT |
| Zero-Shot Learning | J-HMBD Early Action | 10% | 58.1 | GAT |
| Activity Recognition | J-HMBD Early Action | 10% | 58.1 | GAT |
| Action Localization | J-HMBD Early Action | 10% | 58.1 | GAT |
| Action Detection | J-HMBD Early Action | 10% | 58.1 | GAT |
| 3D Action Recognition | J-HMBD Early Action | 10% | 58.1 | GAT |
| Graph Regression | ZINC 100k | MAE | 0.463 | GAT |
| Graph Regression | Lipophilicity | RMSE | 0.95 | GAT |
| Action Recognition | J-HMBD Early Action | 10% | 58.1 | GAT |
| Graph Classification | CIFAR10 100k | Accuracy (%) | 65.48 | GAT |
| Node Classification | Brazil Air-Traffic | Accuracy | 0.382 | GAT (Velickovic et al., 2018) |
| Node Classification | PPI | F1 | 97.3 | GAT |
| Node Classification | Wiki-Vote | Accuracy | 59.4 | GAT (Velickovic et al., 2018) |
| Node Classification | Pubmed | F1-Score | 79 | GAT |
| Node Classification | Europe Air-Traffic | Accuracy | 42.4 | GAT (Velickovic et al., 2018) |
| Node Classification | Flickr | Accuracy | 0.359 | GAT (Velickovic et al., 2018) |
| Node Classification | USA Air-Traffic | Accuracy | 58.5 | GAT (Velickovic et al., 2018) |
| Node Classification | PATTERN 100k | Accuracy (%) | 75.824 | GAT |
| Node Classification | IMDB (Heterogeneous Node Classification) | Macro-F1 | 58.94 | GAT |
| Node Classification | IMDB (Heterogeneous Node Classification) | Micro-F1 | 64.86 | GAT |
| Node Classification | Freebase (Heterogeneous Node Classification) | Accuracy | 65.26 | GAT |
| Node Classification | Freebase (Heterogeneous Node Classification) | Macro-F1 | 40.74 | GAT |
| Node Classification | DBLP (Heterogeneous Node Classification) | Macro-F1 | 93.83 | GAT |
| Node Classification | DBLP (Heterogeneous Node Classification) | Micro-F1 | 93.39 | GAT |
| Node Classification | ACM (Heterogeneous Node Classification) | Macro-F1 | 92.26 | GAT |
| Node Classification | ACM (Heterogeneous Node Classification) | Micro-F1 | 92.19 | GAT |
| Graph Property Prediction | ogbg-code2 | Number of params | 11030210 | GAT |
| Classification | CIFAR10 100k | Accuracy (%) | 65.48 | GAT |
| Node Property Prediction | ogbn-arxiv | Number of params | 1441580 | GAT+label reuse+self KD |
| Node Property Prediction | ogbn-arxiv | Number of params | 1441580 | GAT+label+reuse+topo loss |
| Node Property Prediction | ogbn-products | Number of params | 751574 | GAT with NeighborSampling |
| Node Property Prediction | ogbn-proteins | Number of params | 6360470 | GAT + labels + node2vec |