TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-task Self-distillation for Graph-based Semi-Supervis...

Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Yating Ren, Junzhong Ji, Lingfeng Niu, Minglong Lei

2021-12-02Node Classification
PaperPDF

Abstract

Graph convolutional networks have made great progress in graph-based semi-supervised learning. Existing methods mainly assume that nodes connected by graph edges are prone to have similar attributes and labels, so that the features smoothed by local graph structures can reveal the class similarities. However, there often exist mismatches between graph structures and labels in many real-world scenarios, where the structures may propagate misleading features or labels that eventually affect the model performance. In this paper, we propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks to separately address the mismatch problem from the structure side and the label side. First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs. The feature extraction process is encouraged to capture more complex proximity by jointly optimizing the pre-text task and the target task. Consequently, the local feature aggregations are improved from the structure side. Second, self-distillation uses soft labels of the model itself as additional supervision, which has similar effects as label smoothing. The knowledge from the classification pipeline and the self-supervision pipeline is collectively distilled to improve the generalization ability of the model from the label side. Experiment results show that the proposed method obtains remarkable performance gains under several classic graph convolutional architectures.

Results

TaskDatasetMetricValueModel
Node ClassificationPubmed: fixed 20 node per classAccuracy82.72SDSS-APPNP
Node ClassificationCora: fixed 20 nodes per classAccuracy86SDSS-GCN
Node ClassificationCiteseer: fixed 20 node per classAccuracy76.35SDSS-GAT
Node ClassificationAMZ Computers: fixed 20 node per classAccuracy84.86SDSS-GCN

Related Papers

Demystifying Distributed Training of Graph Neural Networks for Link Prediction2025-06-25Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models2025-06-17Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark2025-06-14Graph Semi-Supervised Learning for Point Classification on Data Manifolds2025-06-13Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols2025-06-11Wasserstein Hypergraph Neural Network2025-06-11Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning2025-06-05iN2V: Bringing Transductive Node Embeddings to Inductive Graphs2025-06-05