Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Layer9 min read

GCN2Conv: Going Deep Without Over-Smoothing

Standard GCNConv degrades after 3-4 layers because over-smoothing makes all representations converge. GCN2Conv fixes this with two simple additions: initial residual connections and identity mapping. The result is a GCN that works with 64+ layers.

PyTorch Geometric

TL;DR

  • 1GCN2Conv adds initial residual connections (skip to input features) and identity mapping (weight matrix close to identity) to GCNConv. These two changes enable 64+ layer GNNs.
  • 2Over-smoothing is the core problem: GCN representations converge as depth increases. GCN2Conv's skip connections preserve the original signal through deep networks.
  • 3Parameters alpha (residual strength) and theta (identity mapping strength) control the depth-expressiveness tradeoff. Typical: alpha=0.1-0.5, theta=0.5-1.5.
  • 4Use GCN2Conv when your task needs long-range context but you want to keep the coupled transformation-propagation paradigm (unlike APPNP's decoupled approach).

Original Paper

Simple and Deep Graph Convolutional Networks

Chen et al. (2020). ICML 2020

Read paper →

What GCN2Conv does

GCN2Conv modifies GCNConv with two mechanisms:

  1. Initial residual connection: At each layer, mix the current representation with the original input features. This ensures the initial signal is never completely lost.
  2. Identity mapping: Add a scaled identity matrix to the weight matrix, ensuring the transformation stays close to the identity function. This prevents each layer from distorting the representation too much.

The math (simplified)

GCN2Conv formula
# Standard GCNConv (over-smooths at depth)
H^(l) = sigma( A_norm · H^(l-1) · W^(l) )

# GCN2Conv (stable at depth)
H^(l) = sigma( A_norm · ((1-alpha) · H^(l-1) + alpha · H^(0))
         · ((1-beta) · I + beta · W^(l)) )

Where:
  H^(0)  = initial features (always accessible via residual)
  alpha  = initial residual weight (how much of H^(0) to mix in)
  beta   = identity mapping weight (beta = theta / (l+1))
  I      = identity matrix
  l      = layer number

Two additions: (1-alpha)*H^(l-1) + alpha*H^(0) preserves input features. (1-beta)*I + beta*W keeps the transformation near identity.

PyG implementation

gcn2_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCN2Conv

class GCNII(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels,
                 num_layers=64, alpha=0.1, theta=0.5):
        super().__init__()
        self.lin_in = torch.nn.Linear(in_channels, hidden_channels)
        self.lin_out = torch.nn.Linear(hidden_channels, out_channels)
        self.convs = torch.nn.ModuleList()
        for layer in range(num_layers):
            self.convs.append(GCN2Conv(
                hidden_channels, alpha=alpha, theta=theta,
                layer=layer + 1, shared_weights=True
            ))

    def forward(self, x, edge_index):
        x = x_0 = F.relu(self.lin_in(x))
        for conv in self.convs:
            x = F.dropout(x, p=0.6, training=self.training)
            x = conv(x, x_0, edge_index)  # x_0 is initial features
            x = F.relu(x)
        x = F.dropout(x, p=0.6, training=self.training)
        return self.lin_out(x)

# 64 layers deep, still works!
model = GCNII(dataset.num_features, 64, dataset.num_classes,
              num_layers=64, alpha=0.1, theta=0.5)

Note: x_0 (initial features) is passed to every layer for the residual connection. shared_weights=True uses the same W across all layers (parameter efficient).

When to use GCN2Conv

  • When you need deep GNNs. Tasks requiring 5+ hops of context benefit from GCN2Conv's ability to go deep without degradation.
  • Large-diameter graphs. Graphs where important context is many hops away (e.g., molecular chains, infrastructure networks) need deep propagation.
  • When APPNP's decoupled approach is too restrictive. If you want per-layer transformations (not just propagation), GCN2Conv gives you depth with coupled transform-propagate at each layer.

When not to use GCN2Conv

  • When 2-3 layers suffice. Most node classification tasks need only 2-3 hops. GCN2Conv's overhead is not justified when shallow models work.
  • Heterogeneous graphs. GCN2Conv is designed for homogeneous, undirected graphs. For multi-type data, use HGTConv.

Frequently asked questions

What is GCN2Conv in PyTorch Geometric?

GCN2Conv implements GCNII from Chen et al. (2020), which adds two mechanisms to GCNConv to enable deep (64+ layer) graph networks: initial residual connections (skip to the input features) and identity mapping (adding a scaled identity to the weight matrix). Together, these prevent over-smoothing.

How does GCN2Conv fix over-smoothing?

Over-smoothing occurs when stacking GCN layers causes all node representations to converge. GCN2Conv fixes this with: (1) initial residual connections that mix each layer's output with the original input features, preserving the initial signal; (2) identity mapping that ensures the weight matrix is close to the identity, preventing information loss.

How many layers can GCN2Conv support?

GCN2Conv has been demonstrated with 64 layers on citation benchmarks while maintaining strong performance. Standard GCNConv degrades rapidly after 3-4 layers. The key parameters controlling depth are alpha (residual weight) and theta (identity mapping strength).

What are the alpha and theta parameters in GCN2Conv?

Alpha controls the initial residual connection weight: how much of the original input features to mix in at each layer. Theta controls the identity mapping strength: how close the weight matrix stays to the identity matrix. Typical values: alpha=0.1-0.5, theta=0.5-1.5. Both increase with layer depth.

When should I use GCN2Conv vs APPNP?

Both enable deep information flow. APPNP decouples transformation from propagation (separate MLP + PageRank). GCN2Conv keeps them coupled but adds skip connections. APPNP is simpler and faster. GCN2Conv is more flexible, allowing different transformations at each layer. Use APPNP when simplicity matters, GCN2Conv when per-layer expressiveness matters.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.