What is transfer learning on graphs?

Transfer learning on graphs applies knowledge learned from one graph or task to improve performance on a different graph or task. A GNN pre-trained on millions of molecules transfers structural knowledge to a drug discovery task with only hundreds of labeled molecules. The model's understanding of graph patterns (ring structures, branching, connectivity) transfers even when the specific graphs differ.

How do you perform transfer learning with GNNs?

Three approaches: (1) Fine-tuning: pre-train on a large source graph, then fine-tune all weights on the target graph. (2) Feature extraction: freeze the pre-trained encoder, train only a new classifier on target data. (3) Adapter: freeze most weights, add small trainable adapter layers for the target domain.

Does transfer learning work across different graph types?

Transfer works best between structurally similar graphs (molecular to molecular, social to social). Cross-domain transfer (molecular to social) is harder because structural patterns differ. However, very general patterns (degree normalization, neighborhood aggregation) do transfer across domains, which is why graph foundation models like KumoRFM work.

What is the relationship between transfer learning and foundation models?

Foundation models are the ultimate form of graph transfer learning. KumoRFM is pre-trained on diverse relational databases, learning general relational patterns. When applied to a new database, it transfers this general knowledge. Zero-shot transfer (no target training) achieves 76.71 AUROC on RelBench, outperforming models trained from scratch.

Transfer Learning on Graphs: Applying GNN Knowledge Across Domains | Kumo.ai

Q: What transfers between graphs?

Structural patterns transfer: how neighborhoods aggregate, what subgraph motifs mean, how degree distributions correlate with properties. Domain-specific semantics (like specific node feature meanings) transfer less well. The more structurally similar the source and target graphs, the better the transfer.

Transfer learning applies GNN knowledge learned on one graph to improve performance on a different graph. A GNN pre-trained on 2 million molecules from ChEMBL learns general chemical patterns: what ring structures mean, how functional groups affect properties, how molecular size relates to solubility. When fine-tuned on your proprietary dataset of 500 drug candidates, this pre-trained knowledge gives a massive head start compared to training from scratch.

This is the same principle that makes ImageNet pre-training valuable for medical imaging and BERT pre-training valuable for legal text. The key question is: what transfers between graphs?

What transfers

Aggregation patterns: how to combine neighbor information effectively (transfers across all graph types)
Structural motifs: triangles, rings, cliques, stars, and their significance (transfers within domains)
Feature interactions: how node features combine with neighborhood structure (transfers between similar feature spaces)
Scale patterns: how graph size, density, and degree distributions correlate with predictions (transfers broadly)

Three transfer approaches

transfer_learning.py

import torch
from torch_geometric.nn import GCNConv

# Pre-trained encoder (from large source dataset)
pretrained_encoder = load_pretrained_gnn('molecular_gnn.pt')

# Approach 1: Fine-tuning (update everything)
model = FineTuneModel(pretrained_encoder, num_target_classes=2)
# Train with small learning rate (1e-4) on target data
# All encoder weights update gradually

# Approach 2: Feature extraction (freeze encoder)
for param in pretrained_encoder.parameters():
    param.requires_grad = False
classifier = torch.nn.Linear(hidden_dim, num_target_classes)
# Only train the classifier on target data

# Approach 3: Adapter (freeze most, add small trainable modules)
class Adapter(torch.nn.Module):
    def __init__(self, hidden_dim, bottleneck=16):
        super().__init__()
        self.down = torch.nn.Linear(hidden_dim, bottleneck)
        self.up = torch.nn.Linear(bottleneck, hidden_dim)
    def forward(self, x):
        return x + self.up(self.down(x).relu())
# Insert adapter after each frozen layer

Three transfer strategies. Fine-tuning is default. Feature extraction for very small target sets. Adapters for efficient multi-task transfer.

Enterprise example: cross-company fraud patterns

A fintech startup has 100,000 transactions and 200 confirmed fraud cases. Training a GNN from scratch on this small dataset overfits quickly. Transfer learning:

Pre-train a GNN on a large public transaction dataset (or a synthetic dataset with known fraud patterns)
The model learns general fraud patterns: unusual degree distributions, temporal velocity anomalies, fan-out/fan-in structures
Fine-tune on the startup's 100,000 transactions with 200 labels
The transferred model achieves 85% AUROC vs 65% for training from scratch

The improvement comes from the pre-trained model already understanding what “suspicious graph structure” looks like. Fine-tuning just adapts this understanding to the specific transaction patterns of the startup.

When transfer fails

Domain mismatch: transferring molecular patterns to social networks. The structural patterns are too different.
Feature mismatch: source and target have completely different feature semantics. Feature extraction still works (ignore features, transfer structure knowledge).
Scale mismatch: pre-training on small graphs, deploying on enormous ones. The model may not have learned patterns relevant to large-scale structure.

Key Takeaways

1Transfer learning reuses GNN knowledge from one graph on another. Aggregation patterns, structural motifs, and feature interaction patterns all transfer, especially within the same domain.
2Fine-tuning is the default: pre-train on a large source, then update all weights on the target with a small learning rate. Feature extraction and adapters are alternatives for very small target sets.
3Works best within domains (molecular to molecular, financial to financial). Cross-domain transfer works for general structural patterns but not domain-specific semantics.
4Foundation models are maximum transfer: KumoRFM pre-trains on diverse relational data and transfers to any new database. Zero-shot transfer achieves 76.71 AUROC on RelBench.
5The practical recipe: find the largest relevant public graph dataset, pre-train (or use a foundation model), fine-tune on your proprietary data. This consistently outperforms training from scratch.

Transfer Learning: Applying GNN Knowledge from One Graph to Another

What transfers

Three transfer approaches

Enterprise example: cross-company fraud patterns

When transfer fails

Frequently asked questions

What is transfer learning on graphs?

What transfers between graphs?

How do you perform transfer learning with GNNs?

Does transfer learning work across different graph types?

What is the relationship between transfer learning and foundation models?

Related

From the Kumo Learn Hub

Learn more about graph ML