100 machine learning methods and techniques
Modularity preserving NMF
Spherical Graph Convolutional Network
BiGG is an autoregressive model for generative modeling for sparse graphs. It utilizes sparsity to avoid generating the full adjacency matrix, and reduces the graph generation time complexity to . Furthermore, during training this autoregressive model can be parallelized with synchronization stages, which makes it much more efficient than other autoregressive models that require . The approach is based on three key elements: (1) an process for generating each edge using a binary tree data structure, inspired by R-MAT; (2) a tree-structured autoregressive model for generating the set of edges associated with each node; and (3) an autoregressive model defined over the sequence of nodes.
Cluster-GCN is a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. Description and image from: Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Structural Deep Network Embedding
Network Embedding as Matrix Factorization:
Canonical Tensor Decomposition with N3 Regularizer
Canonical Tensor Decomposition, trained with N3 regularizer
Relational Reflection Entity Alignment
Graph Convolutional Networks for Fake News Detection
Social media are nowadays one of the main news sources for millions of people around the globe due to their low cost, easy access and rapid dissemination. This however comes at the cost of dubious trustworthiness and significant risk of exposure to 'fake news', intentionally written to mislead the readers. Automatically detecting fake news poses challenges that defy existing content-based analysis approaches. One of the main reasons is that often the interpretation of the news requires the knowledge of political or social context or 'common sense', which current NLP algorithms are still missing. Recent studies have shown that fake and real news spread differently on social media, forming propagation patterns that could be harnessed for the automatic fake news detection. Propagation-based approaches have multiple advantages compared to their content-based counterparts, among which is language independence and better resilience to adversarial attacks. In this paper we show a novel automatic fake news detection model based on geometric deep learning. The underlying core algorithms are a generalization of classical CNNs to graphs, allowing the fusion of heterogeneous data such as content, user profile and activity, social graph, and news propagation. Our model was trained and tested on news stories, verified by professional fact-checking organizations, that were spread on Twitter. Our experiments indicate that social network structure and propagation are important features allowing highly accurate (92.7% ROC AUC) fake news detection. Second, we observe that fake news can be reliably detected at an early stage, after just a few hours of propagation. Third, we test the aging of our model on training and testing data separated in time. Our results point to the promise of propagation-based approaches for fake news detection as an alternative or complementary strategy to content-based approaches.
MinCut Pooling
MinCutPool is a trainable pooling operator for graphs that learns to map nodes into clusters. The method is trained to approximate the minimum K-cut of the graph to ensure that the clusters are balanced, while also jointly optimizing the objective of the task at hand.
Multi-view Knowledge Graph Embedding
Recurrent Event Network
Recurrent Event Network (RE-NET) is an autoregressive architecture for predicting future interactions. The occurrence of a fact (event) is modeled as a probability distribution conditioned on temporal sequences of past knowledge graphs. RE-NET employs a recurrent event encoder to encode past facts and uses a neighborhood aggregator to model the connection of facts at the same timestamp. Future facts can then be inferred in a sequential manner based on the two modules.
Gated Attention Networks
Gated Attention Networks (GaAN) is a new architecture for learning on graphs. Unlike the traditional multi-head attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional sub-network to control each attention head’s importance. Image credit: GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs
Graph Neural Networks with Continual Learning
Although significant effort has been applied to fact-checking, the prevalence of fake news over social media, which has profound impact on justice, public trust and our society, remains a serious problem. In this work, we focus on propagation-based fake news detection, as recent studies have demonstrated that fake news and real news spread differently online. Specifically, considering the capability of graph neural networks (GNNs) in dealing with non-Euclidean data, we use GNNs to differentiate between the propagation patterns of fake and real news on social media. In particular, we concentrate on two questions: (1) Without relying on any text information, e.g., tweet content, replies and user descriptions, how accurately can GNNs identify fake news? Machine learning models are known to be vulnerable to adversarial attacks, and avoiding the dependence on text-based features can make the model less susceptible to the manipulation of advanced fake news fabricators. (2) How to deal with new, unseen data? In other words, how does a GNN trained on a given dataset perform on a new and potentially vastly different dataset? If it achieves unsatisfactory performance, how do we solve the problem without re-training the model on the entire data from scratch? We study the above questions on two datasets with thousands of labelled news items, and our results show that: (1) GNNs can achieve comparable or superior performance without any text information to state-of-the-art methods. (2) GNNs trained on a given dataset may perform poorly on new, unseen data, and direct incremental training cannot solve the problem---this issue has not been addressed in the previous work that applies GNNs for fake news detection. In order to solve the problem, we propose a method that achieves balanced performance on both existing and new datasets, by using techniques from continual learning to train GNNs incrementally.
Adversarial Graph Contrastive Learning
Watch Your Step
Graph Finite-State Automaton
Graph Finite-State Automaton, or GFSA, is a differentiable layer for learning graph structure that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. This layer can be trained end-to-end to add derived relationships (edges) to arbitrary graph-structured data based on performance on a downstream task.
Medical Entity Disambiguation using Graph Neural Networks
CP with N3 Regularizer and Relation Prediction
CP with N3 Regularizer and Relation Prediction
Commute Times Layer
TL;DR: CT-Layer is a GNN Layer which is able to rewire a graph in an inductive an parameter-free way according to the commute times distance (or effective resistance). We address it learning a differentiable way to compute the CT-embedding of the graph. Summary CT-Layer is able to Learn the Commute Times distance between nodes (i.e. effective resistance distance) in a differentiable way, instead of the common spectral version, and in a parameter free manner, which is not the cased of the heat kernel. This approach allow to solve it as an optimization problem inside a GNN, leading to have a new layer which is able to learn how rewire a given graph in an optimal, and inductive way. In addition, CT-Layer also is able to learn Commute Times embeddings, and then calculate it for any graph in an inductive way. The Commute Times embedding is also related with the eigenvalues and eigenvectors of the Laplacian of the graph, because CT embedding is just the eigenvectors scaled. Therefore, CT-Layer is also able to learn hot to calculate the spectrum of the Laplacian in a differentiable way. Therefore, this embedding must satisfy orthogonality and normality. Finally, recent connections has been found between commute times distance and curvature (which is non-differentiable), establishing equivalences between them. Therefore, CT-Layer can also be seen as the differentiable version of the curvature rewiring. We are going through a quick overview of the layer, but I suggest go to the paper for a detailed explanation. Spectral CT- Embedding downsides CT-embedding is computed spectrally in the literature (until the proposal of this method) or it is approximated using the heat kernel (very dependent on hyperparameter ). This fact does not allow us to propose differentiable methods using that measure: Then, CT-distance is given by the Euclidean distances between the embeddings . The spectral form is: being the eigenvectors of the graph Laplacian. This embedding and distances gives us desirable properties of the graph, such an understanding of the structure, or an embedding based on the spectrum which minimizes Dirichlet energies. However, the spectral computation is not differentiable. CT-Layer as an optimization problem: Differentiable, learnable and inductive CT-Layer Giving that minimizes Dirichlet energies s.t. being orthogonal and normalized, we can formulate this problem as constraining neighboring nodes to have a similar embeddings s.t. . With the above elements we have a definition of CT-Layer, our rewiring layer: Given the matrix encoding the features of the nodes after any message passing (MP) layer, learns the association while is optimized according to the loss This results in the following resistance diffusion (Hadamard product between the resistance distance and the adjacency) which provides as input to the subsequent MP layer a learnt convolution matrix. As explained before, is the commute times embedding matrix and the pairwise euclidian distance of that learned embeddings are the commute times distances or resistance distances. Therefore, once trained this layer, it will be able to calculate the commute times embedding for a new graph, and rewire that new and unseen graph in a principled way based on the commute times distance. Preservation of Structure Does this rewiring preserve the original structure? Let be a sampling algorithm of graph , where edges are sampled with probability (proportional to the effective resistance, i.e. commute times). Then, for sufficiently large and , we need O(n\log n/\epsilon^2)G'(1\pm \epsilon)G$.
Wasserstein Embedding for Graph Learning
Please enter a description here
Dynamic Graph Event Detection
Knowledge Graph Refiner
Pseudoinverse Graph Convolutional Network
A GCN method targeted at the unique spectral properties of dense graphs and hypergraphs, enabled by efficient numerical linear algebra.
CP with N3 Regularizer
CP with N3 Regularizer
This is a general approach to convert a neural network into an analytic equation. The technique works as follows: 1. Encourage sparse latent representations 2. Apply symbolic regression to approximate the transformations between in/latent/out layers 3. Compose the symbolic expressions. In the paper, we show that we find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn.
Strain Elevation Tension Spring embedding
SETSe is a deterministic physics based graph embedding algorithm. It embeds weighted feature rich networks. It treats each edge as a spring and each node as a bead whose movement is constrained by the graph adjacency matrix so that the nodes move in parallel planes enforcing a minimum distance between neighboring nodes. The node features act as forces moving the nodes up and down. The network converges to the embedded state when the force produced by each node is equal and opposite to the sum of the forces exerted by its edges, creating a net force of 0. SETSe has no conventional loss function and does not attempt to place similar nodes close to each other.
\emph{Canonical partition} crops the index-restricted d-hop neighborhood around the center node from the target graph. means the shortest distance between and on . \begin{equation} \mathcal{P}(Gt, vc, d) = Gc, \operatorname{ s.t. } Gc \subseteq Gt, Vc = \{ vi \in Vt|\mathcal{D}(Gt,vi,vc) \leq d , vi \leq vc\} \end{equation} The graph obtained by canonical partition is called the \emph{canonical neighborhood}. Canonical neighborhoods can correctly substitute the target graph in canonical count. The subgraph count of query in target equals the summation of the canonical count of query in canonical neighborhoods for all target nodes. Canonical neighborhoods are acquired with canonical partition , given any greater than the diameter of the query. \begin{equation} \mathcal{C}(Gq,Gt) = \sum{vc \in Vt} \mathcal{C}c(Gq, \mathcal{P}(Gt, vc, d),vc), d \geq \max{vi, vj \in Vq} \mathcal{D}(Gq, vi, vj) \end{equation}
AdaGPR is an adaptive, layer-wise graph convolution model. AdaGPR applies adaptive generalized Pageranks at each layer of a GCNII model by learning to predict the coefficients of generalized Pageranks using sparse solvers.
Multiplex Molecular Graph Neural Network
The Multiplex Molecular Graph Neural Network (MXMNet) is an approach for the representation learning of molecules. The molecular interactions are divided into two categories: local and global. Then a two-layer multiplex graph is constructed for a molecule. In , the local layer only contains the local connections that mainly capture covalent interactions, and the global layer contains the global connections that cover non-covalent interactions. MXMNet uses the Multiplex Molecular (MXM) module that contains a novel angle-aware message passing operated on and an efficient message passing operated on .
Heterogeneous Molecular Graph Neural Network
As they carry great potential for modeling complex interactions, graph neural network (GNN)-based methods have been widely used to predict quantum mechanical properties of molecules. Most of the existing methods treat molecules as molecular graphs in which atoms are modeled as nodes. They characterize each atom's chemical environment by modeling its pairwise interactions with other atoms in the molecule. Although these methods achieve a great success, limited amount of works explicitly take many-body interactions, i.e., interactions between three and more atoms, into consideration. In this paper, we introduce a novel graph representation of molecules, heterogeneous molecular graph (HMG) in which nodes and edges are of various types, to model many-body interactions. HMGs have the potential to carry complex geometric information. To leverage the rich information stored in HMGs for chemical prediction problems, we build heterogeneous molecular graph neural networks (HMGNN) on the basis of a neural message passing scheme. HMGNN incorporates global molecule representations and an attention mechanism into the prediction process. The predictions of HMGNN are invariant to translation and rotation of atom coordinates, and permutation of atom indices. Our model achieves state-of-the-art performance in 9 out of 12 tasks on the QM9 dataset.
ComplEx with N3 Regularizer and Relation Prediction Objective
ComplEx model trained with a nuclear norm regularizer; A relation prediction objective is added on top of the commonly used 1vsAll objective.
Dual Graph Convolutional Networks
A dual graph convolutional neural network jointly considers the two essential assumptions of semi-supervised learning: (1) local consistency and (2) global consistency. Accordingly, two convolutional neural networks are devised to embed the local-consistency-based and global-consistency-based knowledge, respectively. Description and image from: Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification
Inductive Shallow Node Embedding
Inductive Shallow Node Embedding extends shallow embeddings to the realm of inductive learning. It has a novel encoder architecture that captures the local neighborhood structure of each node, enabling effective generalization to unseen nodes. In the generalization, robustness is essential to avoid degradation of performance arising from noise in the dataset. It has been theoretically proven that the covariance of the additive noise term in the proposed model is inversely proportional to the cardinality of a node’s neighbors. Another contribution is a mathematical lower bound to quantify the robustness of node embeddings, confirming its advantage over traditional shallow embedding methods, particularly in the presence of parameter noise. The proposed method demonstrably excels in dynamic networks, consistently achieving over 90% performance on previously unseen nodes compared to nodes encountered during training on various benchmarks. The empirical evaluation concludes that our method outperforms competing methods on the vast majority of datasets in both transductive and inductive tasks.
Instruction Pointer Attention Graph Neural Network
Instruction Pointer Attention Graph Neural Network, or IPA-GNN, is a learning-interpreter neural network (LNN) based on GNNs for learning to execute programmes. It achieves improved systematic generalization on the task of learning to execute programs using control flow graphs. The model arises by considering RNNs operating on program traces with branch decisions as latent variables. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution.
Graph Echo State Network
Graph Echo State Network (GraphESN) model is a generalization of the Echo State Network (ESN) approach to graph domains. GraphESNs allow for an efficient approach to Recursive Neural Networks (RecNNs) modeling extended to deal with cyclic/acyclic, directed/undirected, labeled graphs. The recurrent reservoir of the network computes a fixed contractive encoding function over graphs and is left untrained after initialization, while a feed-forward readout implements an adaptive linear output function. Contractivity of the state transition function implies a Markovian characterization of state dynamics and stability of the state computation in presence of cycles. Due to the use of fixed (untrained) encoding, the model represents both an extremely efficient version and a baseline for the performance of recursive models with trained connections. Description from: Graph Echo State Networks
GeniePath is a scalable approach for learning adaptive receptive fields of neural networks defined on permutation invariant graph data. In GeniePath, we propose an adaptive path layer consists of two complementary functions designed for breadth and depth exploration respectively, where the former learns the importance of different sized neighborhoods, while the latter extracts and filters signals aggregated from neighbors of different hops away. Description and image from: GeniePath: Graph Neural Networks with Adaptive Receptive Paths
RESCAL
StoGCN is a control variate based algorithm which allow sampling an arbitrarily small neighbor size. Presents new theoretical guarantee for the algorithms to converge to a local optimum of GCN.
RelDiff generates entity-relation-entity embeddings in a single embedding space. RelDiff adopts two fundamental vector algebraic operators to transform entity and relation embeddings from knowledge graphs into entity-relation-entity embeddings. In particular, RelDiff can encode finer-grained information about the relations than is captured when separate embeddings are learned for the entities and the relations.
DeepDrug is a deep learning framework to overcome these shortcomings by using graph convolutional networks to learn the graphical representations of drugs and proteins such as molecular fingerprints and residual structures in order to boost the prediction accuracy.
Spectral Gap Rewiring Layer
TL;DR: GAP-Layer is a GNN Layer which is able to rewire a graph in an inductive an parameter-free way optimizing the spectral gap (minimizing or maximizing the bottleneck size), learning a differentiable way to compute the Fiedler vector and the Fiedler value of the graph. Summary GAP-Layer is a rewiring layer based on minimizing or maximizing the spectral gap (or graph bottleneck size) in an inductive way. Depending on the mining task we want to perform in our graph, we would like to maximize or minimize the size of the bottleneck, aiming to more connected or more separated communities. GAP-Layer: Spectral Gap Rewiring Loss and derivatives using or For this explanation, we are going to suppose we want to minimize the spectral gap, i.e. make the graph bottleneck size smaller. For minimizing the spectral GAP we minimize this loss: The gradients of this cost function w.r.t each element of are not trivial. Depending on if we use the Laplacian, , or the normalized Laplacian, , the derivatives are going to be different. For the former case (), we will use the derivatives presented in Kang et al. 2019. In the latter scenario (), we present the Spectral Gradients: derivatives from the spectral gap w.r.t. the Normalized Laplacian. However, whatever option we choose, can seen as a function of and , hence, , the gradient of wrt each component of (how does the bottleneck change with each change in our graph?), comes from the chain rule of the matrix derivative if using the Laplacian or if using the normalized Laplacian. Both of this derivatives, relies on the Fiedler vector (2nd eigenvector: if we use and if using instead). For more details on those derivatives, and for the sake of simplicity in this blog explanation, I suggest go to the original paper. Differentiable approximation of and Once we have those derivatives, the problem is still not that trivial. Note that our cost function , relies on an eigenvalue . In addition, the derivatives also depends on the Fiedler vector or , which is the eigenvector corresponding to the aforementioned eigenvalue. However, we DO NOT COMPUTE IT SPECTRALLY, as its computation has a complexity of and would need to be computed in every learning iteration. Instead, we learn an approximation of and use its Dirichlet energy to approximate the . In addition, if using , since , we first approximate and then approximate from . With this approximation, we can easily compute the node belonging to each cluster with a simple MLP. In addition, such as the Fiedler value must satisfy orthogonality and normality, restrictions must be added to that MLP Clustering. GAP-Layer To sum up, GAP-Layer can be defined as the following. Given the matrix encoding the features of the nodes after any message passing (MP) layer, learns the association while is optimized according to the loss: Then, the is approximated from using equation. Once calculated and we consider the loss: returning . Then the GAP diffusion results from minimizing References (Kang et al. 2019) Kang, J., & Tong, H. (2019, November). N2n: Network derivative mining. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 861-870).
TuckER with Relation Prediction
TuckER model trained with a relation prediction objective on top of the 1vsAll loss
RESCAL with Relation Prediction
RESCAL model trained with a relation prediction objective on top of the 1vsAll loss
Automated Graph Learning
Automated graph learning is a method that aims at discovering the best hyper-parameter and neural architecture configuration for different graph tasks/data without manual design.
The core ingredient of CayleyNet is a new class of parametric rational complex functions (Cayley polynomials) allowing to efficiently compute spectral filters on graphs that specialize on frequency bands of interest. The model generates rich spectral filters that are localized in space, scales linearly with the size of the input data for sparsely-connected graphs, and can handle different constructions of Laplacian operators. Description adapted from: CayleyNets: Graph Convolutional Neural Networks with Complex Rational Spectral Filters
Neural network for graphs
NN4G is based on a constructive feedforward architecture with state variables that uses neurons with no feedback connections. The neurons are applied to the input graphs by a general traversal process that relaxes the constraints of previous approaches derived by the causality assumption over hierarchical input data. Description from: Neural network for graphs: a contextual constructive approach
Local Augmentation for Graph Neural Networks, or LA-GNN, is a data augmentation technique that enhances node features by its local subgraph structures. Specifically, it learns the conditional distribution of the connected neighbors’ representations given the representation of the central node, which has an analogy with the Skip-gram of word2vec model that predicts the probability of the context given the central word. After augmenting the neighborhood, we concat the initial and the generated feature matrix as input for GNNs.