Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide9 min read

Graph Convolution: The Convolutional Filter Adapted for Irregular Graph Structure

Graph convolution extends the idea of convolutional filters from regular grids (images, sequences) to arbitrary graph topologies. It is the operation that launched modern GNN research and remains the most widely used GNN layer.

PyTorch Geometric

TL;DR

  • 1Graph convolution computes a new node representation by taking a degree-normalized sum of neighbor features and passing it through a learnable weight matrix. It adapts CNNs from grids to graphs.
  • 2The key formula: h_i = W * sum_j(1/sqrt(deg_i * deg_j) * h_j). Degree normalization prevents high-degree nodes from dominating. Self-loops ensure nodes retain their own features.
  • 3On enterprise relational data, graph convolution propagates information across foreign-key edges. A customer node convolving over order neighbors is equivalent to an automatic aggregation of transaction history.
  • 4GCNConv is the simplest graph convolution layer. GATConv adds learned attention weights. Both are instances of the message passing framework with different aggregation strategies.
  • 5In PyG: GCNConv(in_channels, out_channels). One line. It handles degree normalization, self-loops, and sparse matrix multiplication internally.

Graph convolution is the operation that adapts the convolutional filter from regular grids (images, time series) to irregular graph structures where each node has a different number of neighbors. Each node updates its representation by computing a degree-normalized weighted sum of its neighbors' features and transforming the result through a learnable weight matrix. This is the foundational operation of Graph Convolutional Networks (GCNs), introduced by Kipf and Welling in 2017, and it remains the most widely deployed GNN layer in production.

Why it matters for enterprise data

Enterprise databases are relational. Tables are connected by foreign keys. When you represent this as a graph and apply graph convolution, each row automatically learns from its related rows in other tables. A product node convolves over its order neighbors to learn purchase frequency. An order node convolves over its customer neighbor to inherit customer-level signals. This cross-table information flow happens automatically through the convolution operation.

On the RelBench benchmark, models using graph convolution on relational data achieve 75.83 AUROC across 30 enterprise prediction tasks, compared to 62.44 for flat-table LightGBM that requires manual feature engineering across those same tables.

How graph convolution works

The graph convolution operation for node i is:

gcn_formula.txt
h_i^(l+1) = sigma( sum_j ( 1/sqrt(deg(i)) * 1/sqrt(deg(j)) * W * h_j^(l) ) )

Where:
  h_j^(l)  = feature vector of neighbor j at layer l
  W        = learnable weight matrix (shared across all nodes)
  deg(i)   = degree of node i (number of neighbors + self-loop)
  sigma    = activation function (ReLU, typically)
  j        = ranges over all neighbors of i, including i itself

The degree normalization 1/sqrt(deg(i)*deg(j)) is what makes graph convolution different from simple sum aggregation.

Step-by-step breakdown

  1. Add self-loops: Each node is added as its own neighbor so it retains its own features after convolution.
  2. Compute normalization: For each edge (j → i), compute 1/sqrt(deg(i) * deg(j)). This ensures high-degree nodes do not dominate.
  3. Aggregate: For each node, sum all normalized neighbor feature vectors.
  4. Transform: Multiply the aggregated vector by weight matrix W.
  5. Activate: Apply a nonlinear activation (ReLU) to produce the new node representation.

Concrete example: product demand signals from a supply chain graph

Consider a retail database with tables: products, orders, stores, and suppliers.

  • Product nodes: features = [price, weight, category_id]
  • Store nodes: features = [region, size_sqft, foot_traffic]
  • Edges: store → product (stocks), supplier → product (supplies)

After one graph convolution layer, each product node's representation now includes information about which stores carry it (high-traffic vs. low-traffic regions) and which suppliers provide it (reliable vs. delayed). After two layers, the product also absorbs information about other products in the same stores (competitive landscape) and other products from the same supplier (supply chain risk).

PyG implementation

graph_convolution_pyg.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, num_features, num_classes):
        super().__init__()
        self.conv1 = GCNConv(num_features, 64)
        self.conv2 = GCNConv(64, num_classes)

    def forward(self, x, edge_index):
        # First convolution: raw features -> 64-dim
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=0.5, training=self.training)

        # Second convolution: 64-dim -> num_classes
        x = self.conv2(x, edge_index)
        return x  # logits per node

# GCNConv handles:
# - Self-loop addition (add_self_loops=True by default)
# - Degree normalization (normalize=True by default)
# - Sparse matrix multiplication for efficiency

Two lines of GCNConv give you a full 2-layer graph convolutional network. PyG handles normalization and sparsity.

Graph convolution vs. other GNN operations

  • GCNConv vs. GATConv: GCN uses fixed degree normalization. GAT learns attention weights per edge, letting the model decide which neighbors matter more. GAT is more flexible but computationally heavier.
  • GCNConv vs. SAGEConv: GCN normalizes by both source and target degree. GraphSAGE concatenates the node's own features with the aggregated neighbor features, then applies a linear transform. SAGE is designed for inductive settings where new nodes appear at inference.
  • GCNConv vs. GINConv: GCN uses mean-like aggregation (degree-normalized sum). GIN uses pure sum, achieving maximum expressiveness. GIN can distinguish graph structures that GCN cannot.

Limitations and what comes next

  1. Fixed neighbor weighting: Degree normalization treats all neighbors as equally important after adjusting for degree. In practice, some neighbors carry more signal than others. Attention mechanisms solve this.
  2. Over-smoothing: Stacking more than 5-6 graph convolution layers causes all node representations to converge. Skip connections and graph rewiring help.
  3. Expressiveness ceiling: GCNConv cannot distinguish certain non-isomorphic graphs that differ only in subtle structural ways. The Weisfeiler-Leman test defines this bound.

Graph transformers go beyond graph convolution by replacing local neighborhood aggregation with global attention over the entire graph, removing both the depth limitation and the expressiveness ceiling.

Frequently asked questions

What is graph convolution?

Graph convolution is the operation that adapts the convolutional filter from regular grids (images) to irregular graph structures. Each node computes a new representation by taking a degree-normalized weighted sum of its neighbors' features and passing the result through a learnable linear transformation. It is the core operation in Graph Convolutional Networks (GCNs).

How is graph convolution different from image convolution?

Image convolution slides a fixed-size kernel (e.g., 3x3) over a regular pixel grid. Every pixel has the same number of neighbors in the same spatial arrangement. Graph convolution operates on irregular structures where each node has a different number of neighbors with no inherent spatial ordering. The 'kernel' in graph convolution is a shared weight matrix applied to all neighbor features, with degree normalization replacing spatial position.

What is degree normalization in graph convolution?

Degree normalization scales each neighbor's contribution by the inverse square root of both the source and target node degrees: 1/sqrt(deg(i)) * 1/sqrt(deg(j)). This prevents high-degree nodes from dominating the aggregation and ensures that the scale of the aggregated message is independent of neighborhood size. It is a key design choice in GCNConv.

How does graph convolution apply to tabular enterprise data?

Tabular enterprise data stored in relational databases has natural graph structure through foreign keys. Graph convolution on this structure lets each row learn from its related rows in other tables. A customer node convolves over its order neighbors, which convolve over their product neighbors. This captures multi-table patterns that flat-table models miss entirely.

What are the limitations of graph convolution?

Standard graph convolution (GCNConv) has three main limitations: (1) it uses fixed degree normalization rather than learned attention weights, treating all neighbors equally after normalization; (2) it suffers from over-smoothing after 5-6 layers; (3) its expressiveness is bounded by the 1-WL test, meaning it cannot distinguish certain graph structures. GATConv and GINConv address limitations 1 and 3 respectively.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.