Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide8 min read

Relational Graph Transformers: Attention for Relational Databases

A relational graph transformer applies type-aware attention to heterogeneous graphs built from relational databases. Each table is a node type, each foreign key is an edge type, and the transformer learns cross-table patterns that flat-table ML cannot see.

PyTorch Geometric

TL;DR

  • 1A relational graph transformer extends graph transformers to heterogeneous relational data. Each database table is a node type, each foreign key is an edge type, and attention is type-aware.
  • 2It handles the three challenges of relational data: heterogeneous types (different columns per table), temporal ordering (causal constraints from timestamps), and scale (millions of rows across dozens of tables).
  • 3On the RelBench benchmark (7 databases, 30 tasks, 103M rows), relational graph transformers achieve 81.14 AUROC fine-tuned vs 62.44 for flat-table LightGBM.
  • 4KumoRFM is a pre-trained relational graph transformer that makes zero-shot predictions on new databases. No training, no feature engineering, no data science team required.
  • 5The architecture combines type-specific projections, relational attention heads, temporal encodings, and causal masking into a single end-to-end model.

A relational graph transformer is a graph transformer architecture specialized for data from relational databases. Enterprise data lives in relational databases: customers in one table, orders in another, products in a third, connected by foreign keys. A relational graph transformer converts this schema into a heterogeneous temporal graph and applies type-aware attention to learn cross-table patterns automatically.

This is the architecture behind KumoRFM, which achieves state-of-the-art results on the RelBench benchmark. It handles the three core challenges of enterprise relational data: heterogeneous types, temporal ordering, and massive scale.

From database to graph

The conversion from relational database to graph is mechanical:

  • Each table becomes a node type (customers, orders, products)
  • Each row becomes a node with features from its columns
  • Each foreign key becomes an edge type (customer placed order, order contains product)
  • Each timestamp column defines causal ordering for that node type
database_to_graph.py
# Conceptual: how a relational graph transformer sees your database

# customers table -> 'customer' node type
#   columns: customer_id, age, region, signup_date
#   -> node features: [age, region_embedding, days_since_signup]

# orders table -> 'order' node type
#   columns: order_id, customer_id, amount, order_date
#   -> node features: [amount_normalized, day_of_week, month]

# products table -> 'product' node type
#   columns: product_id, price, category, rating
#   -> node features: [price_normalized, category_embedding, rating]

# Foreign keys -> edge types
# orders.customer_id -> customers.customer_id
#   => ('order', 'placed_by', 'customer') edges
# order_items.product_id -> products.product_id
#   => ('order', 'contains', 'product') edges

# Causal constraint: order_date determines temporal ordering
# At prediction time t, only orders before t are visible

A relational graph transformer reads your database schema and builds the graph automatically. No manual graph construction.

Architecture components

Type-specific projections

Each node type has different columns with different semantics. Type-specific linear projections map each table's features into a shared hidden dimension:

  • Customer features (3 dims) projected to 128 dims
  • Order features (3 dims) projected to 128 dims
  • Product features (3 dims) projected to 128 dims

Relational attention

Attention heads are relation-aware. A customer attending to its orders uses different attention weights than a customer attending to its support tickets. Each edge type has learned query-key projections that capture relation-specific importance patterns.

Temporal encodings and causal masking

Timestamp columns are converted into temporal encodings (similar to positional encodings in language models). Causal masking ensures that when predicting at time t, attention weights are zero for nodes with timestamps after t. This prevents data leakage by construction.

RelBench results

The RelBench benchmark evaluates models on 30 prediction tasks across 7 real enterprise databases (e-commerce, healthcare, social networks, etc.) totaling 103 million rows:

  • LightGBM (flat table): 62.44 average AUROC. Requires manual feature engineering for each task.
  • GNN (message passing): 75.83 average AUROC. Automatic feature learning from graph structure.
  • KumoRFM (zero-shot): 76.71 average AUROC. No training on the target database at all.
  • KumoRFM (fine-tuned): 81.14 average AUROC. Fine-tuned on the target task.

The 30% relative improvement over LightGBM comes from automatically discovering cross-table patterns that manual feature engineering cannot anticipate.

From research to production

Building a relational graph transformer from scratch in PyG requires:

  1. Converting your database schema into HeteroData
  2. Implementing type-specific projections for each table
  3. Adding temporal encodings and causal masking
  4. Building the multi-head relational attention mechanism
  5. Implementing neighbor sampling for scalable training
  6. Setting up training loops with proper temporal train/test splits

KumoRFM packages all of this into a single API call. You point it at your database, write one line of PQL (Predictive Query Language), and get predictions. The relational graph transformer runs under the hood.

Frequently asked questions

What is a relational graph transformer?

A relational graph transformer is a graph transformer architecture specialized for data from relational databases. It handles multiple node types (one per table), multiple edge types (one per foreign key), temporal ordering, and varying feature spaces per table. It combines the global attention of transformers with relational-aware type-specific transformations.

How does it differ from a standard graph transformer?

A standard graph transformer treats all nodes uniformly with one feature space. A relational graph transformer handles heterogeneous types: different tables have different columns, different foreign keys have different semantics, and timestamps create causal ordering. Each type gets its own projection, and attention is type-aware.

What is KumoRFM?

KumoRFM (Relational Foundation Model) is Kumo.ai's production relational graph transformer. It is pre-trained on diverse relational databases and can make zero-shot predictions on new databases without training. It converts any SQL database schema into a heterogeneous temporal graph and applies type-aware attention to generate predictions.

Why not just use LightGBM on flat tables?

Flat-table ML requires manually joining and aggregating related tables (e.g., counting orders per customer in the last 30 days). This misses complex multi-hop and structural patterns. On the RelBench benchmark (7 databases, 30 tasks), relational graph transformers achieve 81.14 AUROC vs 62.44 for LightGBM, a 30% relative improvement.

How does it handle temporal data in relational databases?

Timestamp columns define causal ordering. When predicting an outcome at time t, the model only attends to rows with timestamps before t. This prevents data leakage automatically. Temporal encodings are added to capture recency and periodicity patterns.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.