Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Transfer Learning: Applying GNN Knowledge from One Graph to Another

Transfer learning reuses knowledge from a pre-trained GNN on a new graph or task. This is how you get strong performance on a small enterprise dataset by leveraging patterns learned from large public or cross-domain datasets.

PyTorch Geometric

TL;DR

  • 1Transfer learning applies knowledge from a source graph/task to a target graph/task. The model transfers structural understanding: neighborhood patterns, subgraph motifs, aggregation behaviors.
  • 2Three approaches: fine-tuning (update all weights on target data), feature extraction (freeze encoder, train new classifier), and adapters (freeze most weights, add small trainable modules).
  • 3Works best between structurally similar graphs. Molecular-to-molecular and social-to-social transfer well. Cross-domain transfer is harder but possible for general structural patterns.
  • 4Foundation models are the ultimate transfer: KumoRFM transfers relational patterns learned from diverse databases to any new database, achieving strong zero-shot predictions.
  • 5Enterprise use case: pre-train on a large public graph, fine-tune on your small proprietary graph. This is especially valuable when labeled data is scarce.

Transfer learning applies GNN knowledge learned on one graph to improve performance on a different graph. A GNN pre-trained on 2 million molecules from ChEMBL learns general chemical patterns: what ring structures mean, how functional groups affect properties, how molecular size relates to solubility. When fine-tuned on your proprietary dataset of 500 drug candidates, this pre-trained knowledge gives a massive head start compared to training from scratch.

This is the same principle that makes ImageNet pre-training valuable for medical imaging and BERT pre-training valuable for legal text. The key question is: what transfers between graphs?

What transfers

  • Aggregation patterns: how to combine neighbor information effectively (transfers across all graph types)
  • Structural motifs: triangles, rings, cliques, stars, and their significance (transfers within domains)
  • Feature interactions: how node features combine with neighborhood structure (transfers between similar feature spaces)
  • Scale patterns: how graph size, density, and degree distributions correlate with predictions (transfers broadly)

Three transfer approaches

transfer_learning.py
import torch
from torch_geometric.nn import GCNConv

# Pre-trained encoder (from large source dataset)
pretrained_encoder = load_pretrained_gnn('molecular_gnn.pt')

# Approach 1: Fine-tuning (update everything)
model = FineTuneModel(pretrained_encoder, num_target_classes=2)
# Train with small learning rate (1e-4) on target data
# All encoder weights update gradually

# Approach 2: Feature extraction (freeze encoder)
for param in pretrained_encoder.parameters():
    param.requires_grad = False
classifier = torch.nn.Linear(hidden_dim, num_target_classes)
# Only train the classifier on target data

# Approach 3: Adapter (freeze most, add small trainable modules)
class Adapter(torch.nn.Module):
    def __init__(self, hidden_dim, bottleneck=16):
        super().__init__()
        self.down = torch.nn.Linear(hidden_dim, bottleneck)
        self.up = torch.nn.Linear(bottleneck, hidden_dim)
    def forward(self, x):
        return x + self.up(self.down(x).relu())
# Insert adapter after each frozen layer

Three transfer strategies. Fine-tuning is default. Feature extraction for very small target sets. Adapters for efficient multi-task transfer.

Enterprise example: cross-company fraud patterns

A fintech startup has 100,000 transactions and 200 confirmed fraud cases. Training a GNN from scratch on this small dataset overfits quickly. Transfer learning:

  1. Pre-train a GNN on a large public transaction dataset (or a synthetic dataset with known fraud patterns)
  2. The model learns general fraud patterns: unusual degree distributions, temporal velocity anomalies, fan-out/fan-in structures
  3. Fine-tune on the startup's 100,000 transactions with 200 labels
  4. The transferred model achieves 85% AUROC vs 65% for training from scratch

The improvement comes from the pre-trained model already understanding what “suspicious graph structure” looks like. Fine-tuning just adapts this understanding to the specific transaction patterns of the startup.

When transfer fails

  • Domain mismatch: transferring molecular patterns to social networks. The structural patterns are too different.
  • Feature mismatch: source and target have completely different feature semantics. Feature extraction still works (ignore features, transfer structure knowledge).
  • Scale mismatch: pre-training on small graphs, deploying on enormous ones. The model may not have learned patterns relevant to large-scale structure.

Frequently asked questions

What is transfer learning on graphs?

Transfer learning on graphs applies knowledge learned from one graph or task to improve performance on a different graph or task. A GNN pre-trained on millions of molecules transfers structural knowledge to a drug discovery task with only hundreds of labeled molecules. The model's understanding of graph patterns (ring structures, branching, connectivity) transfers even when the specific graphs differ.

What transfers between graphs?

Structural patterns transfer: how neighborhoods aggregate, what subgraph motifs mean, how degree distributions correlate with properties. Domain-specific semantics (like specific node feature meanings) transfer less well. The more structurally similar the source and target graphs, the better the transfer.

How do you perform transfer learning with GNNs?

Three approaches: (1) Fine-tuning: pre-train on a large source graph, then fine-tune all weights on the target graph. (2) Feature extraction: freeze the pre-trained encoder, train only a new classifier on target data. (3) Adapter: freeze most weights, add small trainable adapter layers for the target domain.

Does transfer learning work across different graph types?

Transfer works best between structurally similar graphs (molecular to molecular, social to social). Cross-domain transfer (molecular to social) is harder because structural patterns differ. However, very general patterns (degree normalization, neighborhood aggregation) do transfer across domains, which is why graph foundation models like KumoRFM work.

What is the relationship between transfer learning and foundation models?

Foundation models are the ultimate form of graph transfer learning. KumoRFM is pre-trained on diverse relational databases, learning general relational patterns. When applied to a new database, it transfers this general knowledge. Zero-shot transfer (no target training) achieves 76.71 AUROC on RelBench, outperforming models trained from scratch.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.