Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications
PyTorch Geometric
PyTorch Geometric (PyG) is the most widely used library for graph machine learning. Created by Matthias Fey, founding engineer at Kumo.ai. 66 layers, 120+ datasets, used by researchers and engineers at every major tech company. This guide shows you how to use it.
23.6K
GitHub Stars
21M+
PyPI Downloads
66
GNN Layers
120+
Datasets
Getting Started
Install PyG, load a dataset, train your first GNN. Three steps.
PyG requires PyTorch. Install both, then verify.
pip install torch torch-geometric
# Verify installation
python -c "import torch_geometric; print(torch_geometric.__version__)"
# 2.7.0The Cora citation network is the “Hello World” of graph ML. 2,708 papers, 10,556 citations, 7 classes.
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
# Load Cora dataset
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]
# Define a 2-layer GCN
class GCN(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = GCNConv(dataset.num_features, 16)
self.conv2 = GCNConv(16, dataset.num_classes)
def forward(self, x, edge_index):
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, p=0.5, training=self.training)
return self.conv2(x, edge_index)
# Train
model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(200):
model.train()
optimizer.zero_grad()
out = model(data.x, data.edge_index)
loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()Check accuracy on the test set. With this basic 2-layer GCN, you should see ~81% accuracy on Cora.
model.eval()
out = model(data.x, data.edge_index)
pred = out.argmax(dim=1)
correct = (pred[data.test_mask] == data.y[data.test_mask]).sum()
acc = correct / data.test_mask.sum()
print(f"Test accuracy: {acc:.4f}")
# Test accuracy: ~0.8150That is your first GNN. From here, explore attention layers for higher accuracy, heterogeneous graphs for enterprise data, or use case blueprints for your specific business problem.
How it works
Graph neural networks learn from the structure of your data. Here is the path from raw database tables to production predictions.
Every relational database is a graph hiding in plain sight. Customers, orders, products, transactions. Rows are nodes. Foreign keys are edges. PyG makes this explicit.
Building graphs from databases →Each GNN layer aggregates information from neighboring nodes. Stack 2-3 layers and every node sees its multi-hop neighborhood. The model discovers patterns that flat tables destroy.
Start with GCNConv →66 layers, each designed for different scenarios. GCNConv for baselines. GATConv when neighbors vary in importance. HGTConv for multi-table enterprise data. We help you pick.
Architecture decision guide →Scaling, serving, temporal handling, explainability. The gap between a notebook and a production system is where most GNN projects die. We cover every step.
Production patterns →66 GNN Layers
Not all GNN layers are equal. Some average neighbors. Others learn attention weights. Some handle multiple table types. Here is how they group by purpose.
The foundations. Every GNN project begins with one of these.
When not all neighbors are equally important.
Multiple table types, typed edges, relational databases.
Full attention. Long-range dependencies. The frontier.
When your graph has millions or billions of nodes.
120+ Concepts
From message passing to graph transformers, from over-smoothing to relational deep learning. Every concept explained with enterprise data examples.
Message Passing
The foundation of all GNN computation
Heterogeneous Graph
Multiple node and edge types. Enterprise reality.
Over-Smoothing
Why 2-3 GNN layers is often optimal
Link Prediction
Recommendations, fraud detection, knowledge graphs
Graph Transformer
Full attention over graph nodes
Relational Deep Learning
Learning directly on relational databases
Neighbor Sampling
Scaling GNNs to billion-node graphs
Data Leakage
The silent killer of graph ML projects
30 Business Problems
Each blueprint shows the business problem, the relational schema, the PyG architecture, working code, and a one-line KumoRFM alternative.
Fraud Detection
Detect fraud rings, not just anomalous transactions
Recommendations
Graph-based recs that beat collaborative filtering
Churn Prediction
Find the customers flat models miss
Credit Risk
Network-aware scoring beyond traditional scorecards
AML
Detect layering chains through shell companies
Demand Forecasting
Cross-store, cross-product graph signals
Drug Discovery
Molecular property prediction with NNConv
Entity Resolution
Graph-based record matching at scale
15 Production Patterns
The gap between a working GNN and a production system is where most projects die. These guides cover scaling, serving, temporal handling, explainability, and integration.
120+ Datasets
From the 34-node Karate Club to the 111M-node OGB-Papers100M. Each page shows stats, PyG loading code, common tasks, and how the dataset connects to real business problems.
KumoRFM uses production-grade graph transformers under the hood. You write one line of PQL. No PyG code, no training loop, no infrastructure. Same accuracy, zero engineering time.