GCN2Conv: Deep Graph Convolution with Skip Connections | PyG Guide

Original Paper

Simple and Deep Graph Convolutional Networks

Chen et al. (2020). ICML 2020

Read paper →

What GCN2Conv does

GCN2Conv modifies GCNConv with two mechanisms:

Initial residual connection: At each layer, mix the current representation with the original input features. This ensures the initial signal is never completely lost.
Identity mapping: Add a scaled identity matrix to the weight matrix, ensuring the transformation stays close to the identity function. This prevents each layer from distorting the representation too much.

The math (simplified)

GCN2Conv formula

# Standard GCNConv (over-smooths at depth)
H^(l) = sigma( A_norm · H^(l-1) · W^(l) )

# GCN2Conv (stable at depth)
H^(l) = sigma( A_norm · ((1-alpha) · H^(l-1) + alpha · H^(0))
         · ((1-beta) · I + beta · W^(l)) )

Where:
  H^(0)  = initial features (always accessible via residual)
  alpha  = initial residual weight (how much of H^(0) to mix in)
  beta   = identity mapping weight (beta = theta / (l+1))
  I      = identity matrix
  l      = layer number

Two additions: (1-alpha)*H^(l-1) + alpha*H^(0) preserves input features. (1-beta)*I + beta*W keeps the transformation near identity.

PyG implementation

gcn2_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCN2Conv

class GCNII(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels,
                 num_layers=64, alpha=0.1, theta=0.5):
        super().__init__()
        self.lin_in = torch.nn.Linear(in_channels, hidden_channels)
        self.lin_out = torch.nn.Linear(hidden_channels, out_channels)
        self.convs = torch.nn.ModuleList()
        for layer in range(num_layers):
            self.convs.append(GCN2Conv(
                hidden_channels, alpha=alpha, theta=theta,
                layer=layer + 1, shared_weights=True
            ))

    def forward(self, x, edge_index):
        x = x_0 = F.relu(self.lin_in(x))
        for conv in self.convs:
            x = F.dropout(x, p=0.6, training=self.training)
            x = conv(x, x_0, edge_index)  # x_0 is initial features
            x = F.relu(x)
        x = F.dropout(x, p=0.6, training=self.training)
        return self.lin_out(x)

# 64 layers deep, still works!
model = GCNII(dataset.num_features, 64, dataset.num_classes,
              num_layers=64, alpha=0.1, theta=0.5)

Note: x_0 (initial features) is passed to every layer for the residual connection. shared_weights=True uses the same W across all layers (parameter efficient).

When to use GCN2Conv

When you need deep GNNs. Tasks requiring 5+ hops of context benefit from GCN2Conv's ability to go deep without degradation.
Large-diameter graphs. Graphs where important context is many hops away (e.g., molecular chains, infrastructure networks) need deep propagation.
When APPNP's decoupled approach is too restrictive. If you want per-layer transformations (not just propagation), GCN2Conv gives you depth with coupled transform-propagate at each layer.

When not to use GCN2Conv

When 2-3 layers suffice. Most node classification tasks need only 2-3 hops. GCN2Conv's overhead is not justified when shallow models work.
Heterogeneous graphs. GCN2Conv is designed for homogeneous, undirected graphs. For multi-type data, use HGTConv.

Frequently asked questions

What is GCN2Conv in PyTorch Geometric?

GCN2Conv implements GCNII from Chen et al. (2020), which adds two mechanisms to GCNConv to enable deep (64+ layer) graph networks: initial residual connections (skip to the input features) and identity mapping (adding a scaled identity to the weight matrix). Together, these prevent over-smoothing.

How does GCN2Conv fix over-smoothing?

Over-smoothing occurs when stacking GCN layers causes all node representations to converge. GCN2Conv fixes this with: (1) initial residual connections that mix each layer's output with the original input features, preserving the initial signal; (2) identity mapping that ensures the weight matrix is close to the identity, preventing information loss.

How many layers can GCN2Conv support?

GCN2Conv has been demonstrated with 64 layers on citation benchmarks while maintaining strong performance. Standard GCNConv degrades rapidly after 3-4 layers. The key parameters controlling depth are alpha (residual weight) and theta (identity mapping strength).

What are the alpha and theta parameters in GCN2Conv?

Alpha controls the initial residual connection weight: how much of the original input features to mix in at each layer. Theta controls the identity mapping strength: how close the weight matrix stays to the identity matrix. Typical values: alpha=0.1-0.5, theta=0.5-1.5. Both increase with layer depth.

When should I use GCN2Conv vs APPNP?

Both enable deep information flow. APPNP decouples transformation from propagation (separate MLP + PageRank). GCN2Conv keeps them coupled but adds skip connections. APPNP is simpler and faster. GCN2Conv is more flexible, allowing different transformations at each layer. Use APPNP when simplicity matters, GCN2Conv when per-layer expressiveness matters.

GCN2Conv: Going Deep Without Over-Smoothing