Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Yating Ren, Junzhong Ji, Lingfeng Niu, Minglong Lei

2021-12-02Node Classification

Abstract

Graph convolutional networks have made great progress in graph-based semi-supervised learning. Existing methods mainly assume that nodes connected by graph edges are prone to have similar attributes and labels, so that the features smoothed by local graph structures can reveal the class similarities. However, there often exist mismatches between graph structures and labels in many real-world scenarios, where the structures may propagate misleading features or labels that eventually affect the model performance. In this paper, we propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks to separately address the mismatch problem from the structure side and the label side. First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs. The feature extraction process is encouraged to capture more complex proximity by jointly optimizing the pre-text task and the target task. Consequently, the local feature aggregations are improved from the structure side. Second, self-distillation uses soft labels of the model itself as additional supervision, which has similar effects as label smoothing. The knowledge from the classification pipeline and the self-supervision pipeline is collectively distilled to improve the generalization ability of the model from the label side. Experiment results show that the proposed method obtains remarkable performance gains under several classic graph convolutional architectures.

Results

Task	Dataset	Metric	Value	Model
Node Classification	Pubmed: fixed 20 node per class	Accuracy	82.72	SDSS-APPNP
Node Classification	Cora: fixed 20 nodes per class	Accuracy	86	SDSS-GCN
Node Classification	Citeseer: fixed 20 node per class	Accuracy	76.35	SDSS-GAT
Node Classification	AMZ Computers: fixed 20 node per class	Accuracy	84.86	SDSS-GCN

Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Abstract

Results

Related Papers

Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Abstract

Results

Related Papers