What GCN2Conv does
GCN2Conv modifies GCNConv with two mechanisms:
- Initial residual connection: At each layer, mix the current representation with the original input features. This ensures the initial signal is never completely lost.
- Identity mapping: Add a scaled identity matrix to the weight matrix, ensuring the transformation stays close to the identity function. This prevents each layer from distorting the representation too much.
The math (simplified)
# Standard GCNConv (over-smooths at depth)
H^(l) = sigma( A_norm · H^(l-1) · W^(l) )
# GCN2Conv (stable at depth)
H^(l) = sigma( A_norm · ((1-alpha) · H^(l-1) + alpha · H^(0))
· ((1-beta) · I + beta · W^(l)) )
Where:
H^(0) = initial features (always accessible via residual)
alpha = initial residual weight (how much of H^(0) to mix in)
beta = identity mapping weight (beta = theta / (l+1))
I = identity matrix
l = layer numberTwo additions: (1-alpha)*H^(l-1) + alpha*H^(0) preserves input features. (1-beta)*I + beta*W keeps the transformation near identity.
PyG implementation
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCN2Conv
class GCNII(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels,
num_layers=64, alpha=0.1, theta=0.5):
super().__init__()
self.lin_in = torch.nn.Linear(in_channels, hidden_channels)
self.lin_out = torch.nn.Linear(hidden_channels, out_channels)
self.convs = torch.nn.ModuleList()
for layer in range(num_layers):
self.convs.append(GCN2Conv(
hidden_channels, alpha=alpha, theta=theta,
layer=layer + 1, shared_weights=True
))
def forward(self, x, edge_index):
x = x_0 = F.relu(self.lin_in(x))
for conv in self.convs:
x = F.dropout(x, p=0.6, training=self.training)
x = conv(x, x_0, edge_index) # x_0 is initial features
x = F.relu(x)
x = F.dropout(x, p=0.6, training=self.training)
return self.lin_out(x)
# 64 layers deep, still works!
model = GCNII(dataset.num_features, 64, dataset.num_classes,
num_layers=64, alpha=0.1, theta=0.5)Note: x_0 (initial features) is passed to every layer for the residual connection. shared_weights=True uses the same W across all layers (parameter efficient).
When to use GCN2Conv
- When you need deep GNNs. Tasks requiring 5+ hops of context benefit from GCN2Conv's ability to go deep without degradation.
- Large-diameter graphs. Graphs where important context is many hops away (e.g., molecular chains, infrastructure networks) need deep propagation.
- When APPNP's decoupled approach is too restrictive. If you want per-layer transformations (not just propagation), GCN2Conv gives you depth with coupled transform-propagate at each layer.
When not to use GCN2Conv
- When 2-3 layers suffice. Most node classification tasks need only 2-3 hops. GCN2Conv's overhead is not justified when shallow models work.
- Heterogeneous graphs. GCN2Conv is designed for homogeneous, undirected graphs. For multi-type data, use HGTConv.