Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Layer8 min read

GENConv: Going Deeper with Generalized Aggregation

GENConv from the DeeperGCN paper enables training GNNs with 28+ layers using three key techniques: generalized aggregation, pre-activation residuals, and message normalization. It is the go-to layer when depth matters for molecular and point cloud tasks.

PyTorch Geometric

TL;DR

  • 1GENConv uses generalized aggregation that interpolates between mean, max, and softmax via a learnable temperature parameter. Adapts to the task automatically.
  • 2Combined with pre-activation residual connections (BN -> ReLU -> Conv) and message normalization, it enables 28+ layer GNNs without over-smoothing.
  • 3Strong on molecular property prediction (OGB-MolHIV) and point cloud tasks where deep propagation captures complex spatial/chemical patterns.
  • 4More expressive than GCNConv at depth. Alternative to GCN2Conv for deep GNN training, with different design philosophy (learnable aggregation vs skip connections).

Original Paper

DeeperGCN: All You Need to Train Deeper GCNs

Li et al. (2020). arXiv 2020

Read paper →

What GENConv does

GENConv is the message-passing layer from the DeeperGCN framework. It introduces a generalized aggregation function:

  1. Compute messages from neighbors (with optional edge features via MLP)
  2. Aggregate using softmax-weighted sum with learnable temperature
  3. Combine with residual connection and normalization

The math (simplified)

GENConv formula
# Generalized aggregation (softmax variant)
m_j = MLP(h_j, e_ij)          # message with optional edge features
w_j = exp(m_j / t) / Σ_k exp(m_k / t)  # softmax with temperature t
h_i' = Σ_j w_j · m_j

When t -> infinity: approaches mean aggregation
When t -> 0: approaches max aggregation
Intermediate t: learnable attention-like weighting

# DeeperGCN block (pre-activation residual)
h = BatchNorm(h)
h = ReLU(h)
h = GENConv(h, edge_index)
h = h + h_residual  # skip connection

The temperature parameter t is learnable, allowing the model to find the optimal aggregation between mean and max for each task.

PyG implementation

gen_conv_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import GENConv, DeepGCNLayer

class DeeperGCN(torch.nn.Module):
    def __init__(self, in_channels, hidden, out_channels, num_layers=14):
        super().__init__()
        self.node_encoder = torch.nn.Linear(in_channels, hidden)
        self.layers = torch.nn.ModuleList()
        for _ in range(num_layers):
            conv = GENConv(hidden, hidden, aggr='softmax', t=1.0,
                          learn_t=True, num_layers=2)
            norm = torch.nn.BatchNorm1d(hidden)
            act = torch.nn.ReLU(inplace=True)
            layer = DeepGCNLayer(conv, norm, act,
                                block='res+', dropout=0.1)
            self.layers.append(layer)
        self.classifier = torch.nn.Linear(hidden, out_channels)

    def forward(self, x, edge_index, edge_attr, batch):
        x = self.node_encoder(x)
        for layer in self.layers:
            x = layer(x, edge_index, edge_attr)
        x = self.layers[0].act(self.layers[0].norm(x))
        from torch_geometric.nn import global_mean_pool
        x = global_mean_pool(x, batch)
        return self.classifier(x)

model = DeeperGCN(in_channels=9, hidden=256, out_channels=1,
                  num_layers=14)

DeepGCNLayer wraps GENConv with pre-activation residual connections. block='res+' means residual connection with pre-activation. learn_t=True makes temperature learnable.

When to use GENConv

  • Molecular property prediction. Complex chemical properties depend on long-range atomic interactions that require deep propagation to capture.
  • Point cloud processing. 3D shape understanding benefits from deep models that can propagate information across large spatial extents.
  • When you need more than 3-4 layers. Any task where GCNConv's shallow limit is a bottleneck.

When not to use GENConv

  • Shallow tasks. If 2-3 layers of GCNConv already achieve your target, GENConv's depth overhead is unnecessary.
  • Heterogeneous graphs. GENConv is designed for homogeneous graphs. Use HGTConv for multi-type data.

Frequently asked questions

What is GENConv in PyTorch Geometric?

GENConv implements the GENeralized graph convolution from the DeeperGCN paper (Li et al., 2020). It uses a generalized message aggregation that interpolates between mean, max, and softmax aggregation via a learnable parameter. Combined with pre-activation residual connections and message normalization, it enables training GNNs with 28+ layers.

How does GENConv enable deep training?

GENConv uses three techniques from DeeperGCN: (1) generalized aggregation with softmax or power-mean that avoids the over-smoothing of standard mean aggregation, (2) pre-activation residual connections (BatchNorm -> ReLU -> Conv instead of Conv -> ReLU), and (3) message normalization that stabilizes training at depth.

What is generalized aggregation in GENConv?

GENConv's aggregation interpolates between common functions via a temperature parameter: at high temperature it approaches mean, at low temperature it approaches max, and at intermediate values it acts like softmax attention. This learnable aggregation adapts to the task automatically.

When should I use GENConv vs GCN2Conv?

Both enable deep GNNs. GCN2Conv uses initial residual connections and identity mapping. GENConv uses generalized aggregation and pre-activation residuals. GENConv is more common in molecular/point cloud tasks. GCN2Conv is more common in node classification. Try both if you need depth.

How many layers can GENConv support?

The DeeperGCN paper demonstrates 28 layers on OGB-MolHIV and 56 layers on point cloud tasks. Performance typically peaks around 14-28 layers depending on the task, far beyond GCN's 2-3 layer limit.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.