APPNP: Personalized PageRank GNN Layer Explained | PyG Guide

Original Paper

Predict then Propagate: Graph Neural Networks meet Personalized PageRank

Gasteiger et al. (2018). ICLR 2019

What APPNP does

APPNP operates in two distinct phases:

Predict: Apply an MLP to each node's features to produce initial predictions. No graph structure is used yet.
Propagate: Smooth the predictions using personalized PageRank. At each step, mix the neighborhood average with the node's original prediction (teleport).

The teleport mechanism is the key innovation: it ensures each node retains its own prediction signal even after many propagation steps, preventing the over-smoothing that plagues deep GCNs.

The math (simplified)

APPNP formula

# Phase 1: Predict (MLP, no graph)
Z = MLP(X)    # initial predictions from node features

# Phase 2: Propagate (personalized PageRank)
H^(0) = Z
H^(k) = (1 - alpha) · A_norm · H^(k-1) + alpha · Z

Where:
  alpha   = teleport probability (e.g., 0.1)
  A_norm  = normalized adjacency matrix
  K       = number of propagation steps (e.g., 10)
  Z       = original MLP predictions (always mixed back in)

Final output: H^(K)

The alpha*Z term is the teleport: it always adds back the original prediction, preventing all nodes from converging to the same representation.

PyG implementation

appnp_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import APPNP

class APPNPNet(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels,
                 K=10, alpha=0.1):
        super().__init__()
        # Prediction MLP
        self.lin1 = torch.nn.Linear(in_channels, hidden_channels)
        self.lin2 = torch.nn.Linear(hidden_channels, out_channels)
        # Propagation layer
        self.prop = APPNP(K=K, alpha=alpha)

    def forward(self, x, edge_index):
        # Phase 1: Predict
        x = F.relu(self.lin1(x))
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin2(x)
        # Phase 2: Propagate
        x = self.prop(x, edge_index)
        return x

# 10 hops of propagation with only 2 linear layers of parameters
model = APPNPNet(dataset.num_features, 64, dataset.num_classes,
                 K=10, alpha=0.1)

The MLP has fixed depth (2 layers here) regardless of propagation depth K. This is far more parameter-efficient than stacking 10 GCNConv layers.

When to use APPNP

When you need long-range propagation. If 2-3 hops of GCNConv are not enough context, APPNP gives you 10+ hops without over-smoothing.
Semi-supervised node classification. APPNP was designed for settings with few labeled nodes. The deep propagation spreads label information further across the graph.
When you want fewer parameters. APPNP's MLP + propagation approach uses fewer parameters than stacking multiple GCN layers, making it faster to train.
Homogeneous, undirected graphs. Like GCNConv, APPNP is designed for single-type graphs with symmetric edges.

When not to use APPNP

Heterogeneous graphs. The propagation uses a single adjacency matrix with no type-specific treatment. Use HGTConv or RGCNConv for multi-type graphs.
When neighbor importance varies. APPNP uses uniform propagation (like GCN). For attention-weighted propagation, use GATConv or TransformerConv.

Frequently asked questions

What is APPNP in PyTorch Geometric?

APPNP implements 'Predict then Propagate' from Gasteiger et al. (2018). Unlike GCNConv which interleaves transformation and propagation in each layer, APPNP first transforms features with an MLP, then propagates the predictions using personalized PageRank. This decouples the depth of propagation from the number of learnable parameters.

How does APPNP differ from GCNConv?

GCNConv applies a weight matrix at each layer, tying the number of parameters to the propagation depth. APPNP uses a single MLP for prediction, then propagates with personalized PageRank for many hops without additional parameters. This allows deep propagation (10+ hops) without over-smoothing or parameter explosion.

What is the teleport probability (alpha) in APPNP?

Alpha is the probability of returning to the node's original prediction at each propagation step. With alpha=0.1, each step mixes 10% of the original prediction with 90% of the neighborhood average. Higher alpha preserves more of the original signal (less smoothing). Typical values are 0.1-0.2.

How many propagation steps should I use with APPNP?

APPNP can use 10-20 propagation steps without over-smoothing, thanks to the teleport mechanism. This is far more than the 2-3 layers typical for GCNConv. More steps capture longer-range dependencies. The teleport probability prevents information loss at depth.

When should I use APPNP vs GCN2Conv?

Both address over-smoothing for deeper models. APPNP separates transformation from propagation entirely (predict then propagate). GCN2Conv adds skip connections within a coupled architecture. APPNP is simpler and faster. GCN2Conv is more flexible. For most cases, APPNP is the better starting point.

APPNP: Predict Then Propagate with Personalized PageRank