Original Paper
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Chiang et al. (2019). KDD 2019
Read paper →What ClusterGCNConv does
ClusterGCN follows a two-phase approach:
- Partition: Split the graph into K clusters using METIS or spectral clustering. This is done once before training.
- Train: At each step, sample one or more clusters, extract the subgraph (nodes + intra-cluster edges), and run standard GCN on the subgraph.
Because each cluster is small, the entire subgraph fits in GPU memory. The model trains on different clusters in each epoch, eventually seeing all nodes and most edges.
PyG implementation
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.loader import ClusterData, ClusterLoader
# Step 1: Partition graph into clusters
cluster_data = ClusterData(data, num_parts=1000)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True)
# batch_size=20 means 20 clusters per mini-batch
class GCN(torch.nn.Module):
def __init__(self, in_channels, hidden, out_channels):
super().__init__()
self.conv1 = GCNConv(in_channels, hidden)
self.conv2 = GCNConv(hidden, out_channels)
def forward(self, x, edge_index):
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, p=0.5, training=self.training)
return self.conv2(x, edge_index)
model = GCN(dataset.num_features, 256, dataset.num_classes)
for batch in loader:
# batch is a subgraph with ~20 clusters
out = model(batch.x, batch.edge_index)
loss = F.cross_entropy(out[batch.train_mask], batch.y[batch.train_mask])
loss.backward()ClusterData partitions once; ClusterLoader samples cluster groups each step. The GCN model itself is unchanged - the scaling comes from the data loader.
When to use ClusterGCNConv
- Large homogeneous graphs. Reddit (232K nodes), OGB-Products (2.4M nodes) and similar large single-type graphs.
- Deep GCN training. ClusterGCN was originally motivated by training deep (5+ layer) GCNs. The bounded memory per batch makes deep models feasible.
- When compute efficiency matters. SAGEConv's neighbor sampling has redundant computation for overlapping neighborhoods. ClusterGCN avoids this by processing entire subgraphs.
When not to use ClusterGCNConv
- When inter-cluster edges are critical. Graph structures where important connections span cluster boundaries (e.g., fraud networks with long-range connections) lose signal from cluster partitioning.
- Inductive learning. The partitioning is fixed. New nodes require re-partitioning. SAGEConv handles new nodes naturally.
How KumoRFM builds on this
KumoRFM's training infrastructure combines insights from both ClusterGCN and neighbor sampling:
- Intelligent partitioning that respects graph structure and minimizes edge loss across partition boundaries
- Hybrid sampling that combines cluster-level efficiency with node-level flexibility for new entities
- Production-scale optimization that trains on billion-node graphs with bounded memory and predictable latency