Transductive learning in graph neural networks is the setting where the entire graph, including unlabeled test nodes, is present during training, and the model learns fixed representations for all nodes simultaneously. The model leverages the structural connections and features of unlabeled nodes to improve its predictions, even though their labels are hidden. This is the default setting for most academic GNN benchmarks and papers, but it has significant limitations for real-world enterprise deployment.
Why it matters for enterprise data
Understanding transductive learning matters because most GNN tutorials and benchmarks use this setting, and naively applying their results to enterprise problems can be misleading.
A GCN achieving 81.5% accuracy on Cora in the transductive setting does so with access to the entire citation network during training. In an enterprise setting, this would mean having every customer, every transaction, and every product present at training time, with no new entities appearing at inference. That is unrealistic for any live business.
When evaluating GNN architectures for enterprise use, ensure benchmarks use inductive evaluation splits (training graph is a strict subset of the full graph) rather than transductive splits (train/val/test masks on the same graph).
How transductive training works
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
# Load Cora - a single fixed graph
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]
class GCN(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = GCNConv(dataset.num_features, 16)
self.conv2 = GCNConv(16, dataset.num_classes)
def forward(self):
# NOTE: the ENTIRE graph is used in every forward pass
x = F.relu(self.conv1(data.x, data.edge_index))
x = F.dropout(x, training=self.training)
x = self.conv2(x, data.edge_index)
return F.log_softmax(x, dim=1)
model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(200):
model.train()
out = model() # all nodes pass through the model
loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
# Test: predict on SAME graph, SAME nodes (just different mask)
model.eval()
pred = model().argmax(dim=1)
accuracy = pred[data.test_mask].eq(data.y[data.test_mask]).sum() / data.test_mask.sum()Classic transductive training: the entire graph (all 2,708 nodes) passes through the model every forward pass. Only train_mask nodes contribute to the loss.
Transductive vs. inductive: when to use each
- Use transductive: One-time analysis of a static graph. Classifying all nodes in a knowledge graph. Labeling a fixed citation network. Any setting where the node set is complete and known.
- Use inductive: Production systems with evolving data. Customer churn prediction. Fraud detection. Product recommendation. Any setting where new entities appear after training.
Limitations and what comes next
- Cannot handle new nodes: The fundamental limitation. New customers, new transactions, new products cannot be scored without retraining or at least fine-tuning.
- Memory scales with graph size: The entire graph must fit in memory during training. For enterprise graphs with millions of nodes, this requires distributed training or subgraph sampling (which shifts toward inductive learning).
- Benchmark inflation: Transductive accuracy numbers are higher than inductive ones because the model has access to test node features and structure. This overstates real-world performance.
For enterprise production, inductive learning with neighbor sampling is the standard approach. KumoRFM takes this further with a foundation model that generalizes across entirely different relational databases.