Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Transductive Learning: GNNs That Require the Full Graph at Training Time

Transductive learning leverages the structure and features of all nodes, including unlabeled ones, during training. It is the standard setting for GNN benchmarks but has critical limitations for production enterprise systems.

PyTorch Geometric

TL;DR

  • 1Transductive learning uses the entire graph (labeled + unlabeled nodes) during training. The model sees test nodes' features and connections, just not their labels.
  • 2Most GNN benchmarks (Cora, CiteSeer, PubMed) are transductive: semi-supervised classification on a fixed graph. This is why benchmark results do not always transfer to production.
  • 3Transductive models cannot predict on new nodes without retraining. This is a dealbreaker for enterprise systems where new customers, transactions, and products arrive continuously.
  • 4Methods like node2vec and DeepWalk are purely transductive: they learn fixed per-node embeddings. Parameterized GNN layers (GCNConv, GATConv) can be used in either setting.
  • 5For production enterprise systems, inductive learning is almost always preferred. Transductive learning is appropriate only for one-time analysis on static graph snapshots.

Transductive learning in graph neural networks is the setting where the entire graph, including unlabeled test nodes, is present during training, and the model learns fixed representations for all nodes simultaneously. The model leverages the structural connections and features of unlabeled nodes to improve its predictions, even though their labels are hidden. This is the default setting for most academic GNN benchmarks and papers, but it has significant limitations for real-world enterprise deployment.

Why it matters for enterprise data

Understanding transductive learning matters because most GNN tutorials and benchmarks use this setting, and naively applying their results to enterprise problems can be misleading.

A GCN achieving 81.5% accuracy on Cora in the transductive setting does so with access to the entire citation network during training. In an enterprise setting, this would mean having every customer, every transaction, and every product present at training time, with no new entities appearing at inference. That is unrealistic for any live business.

When evaluating GNN architectures for enterprise use, ensure benchmarks use inductive evaluation splits (training graph is a strict subset of the full graph) rather than transductive splits (train/val/test masks on the same graph).

How transductive training works

transductive_training.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid

# Load Cora - a single fixed graph
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]

class GCN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(dataset.num_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self):
        # NOTE: the ENTIRE graph is used in every forward pass
        x = F.relu(self.conv1(data.x, data.edge_index))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, data.edge_index)
        return F.log_softmax(x, dim=1)

model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(200):
    model.train()
    out = model()  # all nodes pass through the model
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

# Test: predict on SAME graph, SAME nodes (just different mask)
model.eval()
pred = model().argmax(dim=1)
accuracy = pred[data.test_mask].eq(data.y[data.test_mask]).sum() / data.test_mask.sum()

Classic transductive training: the entire graph (all 2,708 nodes) passes through the model every forward pass. Only train_mask nodes contribute to the loss.

Transductive vs. inductive: when to use each

  • Use transductive: One-time analysis of a static graph. Classifying all nodes in a knowledge graph. Labeling a fixed citation network. Any setting where the node set is complete and known.
  • Use inductive: Production systems with evolving data. Customer churn prediction. Fraud detection. Product recommendation. Any setting where new entities appear after training.

Limitations and what comes next

  1. Cannot handle new nodes: The fundamental limitation. New customers, new transactions, new products cannot be scored without retraining or at least fine-tuning.
  2. Memory scales with graph size: The entire graph must fit in memory during training. For enterprise graphs with millions of nodes, this requires distributed training or subgraph sampling (which shifts toward inductive learning).
  3. Benchmark inflation: Transductive accuracy numbers are higher than inductive ones because the model has access to test node features and structure. This overstates real-world performance.

For enterprise production, inductive learning with neighbor sampling is the standard approach. KumoRFM takes this further with a foundation model that generalizes across entirely different relational databases.

Frequently asked questions

What is transductive learning in GNNs?

Transductive learning is a setting where the entire graph, including all nodes (labeled and unlabeled), is available during training. The model leverages the structure and features of unlabeled test nodes to improve predictions, even though their labels are not used. This is the standard setting for academic GNN benchmarks like Cora, CiteSeer, and PubMed.

Why is transductive learning common in GNN research?

Most GNN benchmark datasets (Cora, CiteSeer, PubMed) are single fixed graphs where the task is semi-supervised node classification: some nodes are labeled, others are not. The model sees the entire graph during training and predicts labels for the unlabeled nodes. This transductive setting is simpler to implement and evaluate than inductive settings.

What is the disadvantage of transductive learning?

Transductive models cannot predict on nodes that were not present during training. If a new node appears, the entire model must be retrained or fine-tuned. This makes transductive approaches impractical for production systems where new entities (customers, transactions, products) arrive continuously. Additionally, transductive methods like node2vec store per-node embeddings, consuming memory proportional to graph size.

When is transductive learning appropriate?

Transductive learning is appropriate when the graph is static and all nodes of interest are known at training time. Examples: classifying all papers in a fixed citation network, labeling all nodes in a static knowledge graph, or performing one-time analysis on a snapshot of a social network. It is not appropriate for real-time prediction on evolving enterprise databases.

Can GCNConv be used transductively or inductively?

GCNConv itself is a parameterized layer that can be used in both settings. In the transductive setting, the entire graph (including test nodes) is passed through GCNConv during training. In the inductive setting, only the training subgraph is used during training, and the learned weights are applied to new nodes at inference. The layer is the same; the training protocol differs.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.