Original Paper
Predict then Propagate: Graph Neural Networks meet Personalized PageRank
Gasteiger et al. (2018). ICLR 2019
Read paper →What APPNP does
APPNP operates in two distinct phases:
- Predict: Apply an MLP to each node's features to produce initial predictions. No graph structure is used yet.
- Propagate: Smooth the predictions using personalized PageRank. At each step, mix the neighborhood average with the node's original prediction (teleport).
The teleport mechanism is the key innovation: it ensures each node retains its own prediction signal even after many propagation steps, preventing the over-smoothing that plagues deep GCNs.
The math (simplified)
# Phase 1: Predict (MLP, no graph)
Z = MLP(X) # initial predictions from node features
# Phase 2: Propagate (personalized PageRank)
H^(0) = Z
H^(k) = (1 - alpha) · A_norm · H^(k-1) + alpha · Z
Where:
alpha = teleport probability (e.g., 0.1)
A_norm = normalized adjacency matrix
K = number of propagation steps (e.g., 10)
Z = original MLP predictions (always mixed back in)
Final output: H^(K)The alpha*Z term is the teleport: it always adds back the original prediction, preventing all nodes from converging to the same representation.
PyG implementation
import torch
import torch.nn.functional as F
from torch_geometric.nn import APPNP
class APPNPNet(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels,
K=10, alpha=0.1):
super().__init__()
# Prediction MLP
self.lin1 = torch.nn.Linear(in_channels, hidden_channels)
self.lin2 = torch.nn.Linear(hidden_channels, out_channels)
# Propagation layer
self.prop = APPNP(K=K, alpha=alpha)
def forward(self, x, edge_index):
# Phase 1: Predict
x = F.relu(self.lin1(x))
x = F.dropout(x, p=0.5, training=self.training)
x = self.lin2(x)
# Phase 2: Propagate
x = self.prop(x, edge_index)
return x
# 10 hops of propagation with only 2 linear layers of parameters
model = APPNPNet(dataset.num_features, 64, dataset.num_classes,
K=10, alpha=0.1)The MLP has fixed depth (2 layers here) regardless of propagation depth K. This is far more parameter-efficient than stacking 10 GCNConv layers.
When to use APPNP
- When you need long-range propagation. If 2-3 hops of GCNConv are not enough context, APPNP gives you 10+ hops without over-smoothing.
- Semi-supervised node classification. APPNP was designed for settings with few labeled nodes. The deep propagation spreads label information further across the graph.
- When you want fewer parameters. APPNP's MLP + propagation approach uses fewer parameters than stacking multiple GCN layers, making it faster to train.
- Homogeneous, undirected graphs. Like GCNConv, APPNP is designed for single-type graphs with symmetric edges.
When not to use APPNP
- Heterogeneous graphs. The propagation uses a single adjacency matrix with no type-specific treatment. Use HGTConv or RGCNConv for multi-type graphs.
- When neighbor importance varies. APPNP uses uniform propagation (like GCN). For attention-weighted propagation, use GATConv or TransformerConv.