TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/GAP-Layer

GAP-Layer

Spectral Gap Rewiring Layer

GraphsIntroduced 20001 papers
Source Paper

Description

TL;DR: GAP-Layer is a GNN Layer which is able to rewire a graph in an inductive an parameter-free way optimizing the spectral gap (minimizing or maximizing the bottleneck size), learning a differentiable way to compute the Fiedler vector and the Fiedler value of the graph.

Summary

GAP-Layer is a rewiring layer based on minimizing or maximizing the spectral gap (or graph bottleneck size) in an inductive way. Depending on the mining task we want to perform in our graph, we would like to maximize or minimize the size of the bottleneck, aiming to more connected or more separated communities.

GAP-Layer: Spectral Gap Rewiring

Loss and derivatives using L\mathbf{L}L or L\mathbf{\cal L}L

For this explanation, we are going to suppose we want to minimize the spectral gap, i.e. make the graph bottleneck size smaller. For minimizing the spectral GAP we minimize this loss:

L_Fiedler=∥A~−A∥_F+α(λ_2)2L\_{Fiedler} = \|\tilde{\mathbf{A}}-\mathbf{A}\| \_F + \alpha(\lambda\_2)^2L_Fiedler=∥A~−A∥_F+α(λ_2)2

The gradients of this cost function w.r.t each element of A\mathbf{A}A are not trivial. Depending on if we use the Laplacian, L\mathbf{L}L, or the normalized Laplacian, L\cal LL, the derivatives are going to be different. For the former case (L\mathbf{L}L), we will use the derivatives presented in Kang et al. 2019. In the latter scenario (L\cal LL), we present the Spectral Gradients: derivatives from the spectral gap w.r.t. the Normalized Laplacian. However, whatever option we choose, λ2\lambda_2λ2​ can seen as a function of A~\tilde{\mathbf{A}}A~ and , hence, ∇_A~λ_2\nabla\_{\tilde{\mathbf{A}}}\lambda\_2∇_A~λ_2, the gradient of λ_2\lambda\_2λ_2 wrt each component of A~\tilde{\mathbf{A}}A~ (how does the bottleneck change with each change in our graph?), comes from the chain rule of the matrix derivative Tr[(∇_L~λ_2)T⋅∇_A~L~]Tr\left[\left(\nabla\_{\tilde{\mathbf{L}}}\lambda\_2\right)^T\cdot\nabla\_{\tilde{\mathbf{A}}}\tilde{\mathbf{L}}\right]Tr[(∇_L~λ_2)T⋅∇_A~L~] if using the Laplacian or Tr[(∇_L~λ_2)T⋅∇_A~L~]Tr\left[\left(\nabla\_{\tilde{\mathbf{\cal L}}}\lambda\_2\right)^T\cdot\nabla\_{\tilde{\mathbf{A}}}\tilde{\mathbf{\cal L}}\right]Tr[(∇_L~λ_2)T⋅∇_A~L~] if using the normalized Laplacian. Both of this derivatives, relies on the Fiedler vector (2nd eigenvector: f_2\mathbf{f}\_2f_2 if we use L\mathbf{L}L and g_2\mathbf{g}\_2g_2 if using L\mathbf{\cal L}L instead). For more details on those derivatives, and for the sake of simplicity in this blog explanation, I suggest go to the original paper.

Differentiable approximation of f2\mathbf{f}_2f2​ and λ2\lambda_2λ2​

Once we have those derivatives, the problem is still not that trivial. Note that our cost function L_FiedlerL\_{Fiedler}L_Fiedler, relies on an eigenvalue λ_2\lambda\_2λ_2. In addition, the derivatives also depends on the Fiedler vector f_2\mathbf{f}\_2f_2 or g_2\mathbf{g}\_2g_2, which is the eigenvector corresponding to the aforementioned eigenvalue. However, we DO NOT COMPUTE IT SPECTRALLY, as its computation has a complexity of O(n3)O(n^3)O(n3) and would need to be computed in every learning iteration. Instead, we learn an approximation of f_2\mathbf{f}\_2f_2 and use its Dirichlet energy E(f_2){\cal E}(\mathbf{f}\_2)E(f_2) to approximate the λ2\lambda_2λ2​.

f_2(u)=+1/nif    u    belongs to the first cluster−1/nif    u    belongs to the second cluster\mathbf{f}\_2(u) = \begin{array}{cl} +1/\sqrt{n} & \text{if}\;\; u\;\; \text{belongs to the first cluster} \\ -1/\sqrt{n} & \text{if}\;\; u\;\; \text{belongs to the second cluster} \end{array} f_2(u)=+1/n​−1/n​​ifubelongs to the first clusterifubelongs to the second cluster​

In addition, if using L\mathbf{\cal L}L, since g_2=D1/2f2\mathbf{g}\_2=\mathbf{D}^{1/2}\mathbf{f}_2g_2=D1/2f2​, we first approximate g2\mathbf{g}_2g2​ and then approximate λ2\lambda_2λ2​ from E(g_2){\cal E}(\mathbf{g}\_2)E(g_2). With this approximation, we can easily compute the node belonging to each cluster with a simple MLP. In addition, such as the Fiedler value must satisfy orthogonality and normality, restrictions must be added to that MLP Clustering.

GAP-Layer

To sum up, GAP-Layer can be defined as the following. Given the matrix X_n×F\mathbf{X}\_{n\times F}X_n×F encoding the features of the nodes after any message passing (MP) layer, S_n×2=Softmax(MLP(X))\mathbf{S}\_{n\times 2}=\textrm{Softmax}(\textrm{MLP}(\mathbf{X}))S_n×2=Softmax(MLP(X)) learns the association X→S\mathbf{X}\rightarrow \mathbf{S}X→S while S\mathbf{S}S is optimized according to the loss:

L_Cut=−Tr[STAS]Tr[STDS]+∥STS∥STS∥_F−I_n2∥_FL\_{Cut} = -\frac{Tr[\mathbf{S}^T\mathbf{A}\mathbf{S}]}{Tr[\mathbf{S}^T\mathbf{D}\mathbf{S}]} + \left\|\frac{\mathbf{S}^T\mathbf{S}}{\|\mathbf{S}^T\mathbf{S}\|\_F} - \frac{\mathbf{I}\_n}{\sqrt{2}}\right\|\_FL_Cut=−Tr[STDS]Tr[STAS]​+​∥STS∥_FSTS​−2​I_n​​_F

Then, the f_2\mathbf{f}\_2f_2 is approximated from S\mathbf{S}S using f_2(u)\mathbf{f}\_2(u)f_2(u) equation. Once calculated f_2\mathbf{f}\_2f_2 and λ_2\lambda\_2λ_2 we consider the loss:

L_Fiedler=∥A~−A∥_F+α(λ_2)2L\_{Fiedler} = \|\tilde{\mathbf{A}}-\mathbf{A}\|\_F + \alpha(\lambda\_2)^2L_Fiedler=∥A~−A∥_F+α(λ_2)2

A~=A−μ∇A~λ_2\mathbf{\tilde{A}} = \mathbf{A} - \mu \nabla_\mathbf{\tilde{A}}\lambda\_2A~=A−μ∇A~​λ_2 returning A~\tilde{\mathbf{A}}A~. Then the GAP diffusion TGAP=A~(S)⊙A\mathbf{T}^{GAP} = \tilde{\mathbf{A}}(\mathbf{S}) \odot \mathbf{A}TGAP=A~(S)⊙A results from minimizing

LGAP=L_Cut+L_FiedlerL_{GAP}= L\_{Cut} + L\_{Fiedler}LGAP​=L_Cut+L_Fiedler

References (Kang et al. 2019) Kang, J., & Tong, H. (2019, November). N2n: Network derivative mining. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 861-870).

Papers Using This Method

DiffWire: Inductive Graph Rewiring via the Lovász Bound2022-06-15