What is a relational graph transformer?

A relational graph transformer is a graph transformer architecture specialized for data from relational databases. It handles multiple node types (one per table), multiple edge types (one per foreign key), temporal ordering, and varying feature spaces per table. It combines the global attention of transformers with relational-aware type-specific transformations.

How does it differ from a standard graph transformer?

A standard graph transformer treats all nodes uniformly with one feature space. A relational graph transformer handles heterogeneous types: different tables have different columns, different foreign keys have different semantics, and timestamps create causal ordering. Each type gets its own projection, and attention is type-aware.

KumoRFM (Relational Foundation Model) is Kumo.ai's production relational graph transformer. It is pre-trained on diverse relational databases and can make zero-shot predictions on new databases without training. It converts any SQL database schema into a heterogeneous temporal graph and applies type-aware attention to generate predictions.

Why not just use LightGBM on flat tables?

Flat-table ML requires manually joining and aggregating related tables (e.g., counting orders per customer in the last 30 days). This misses complex multi-hop and structural patterns. On the RelBench benchmark (7 databases, 30 tasks), relational graph transformers achieve 81.14 AUROC vs 62.44 for LightGBM, a 30% relative improvement.

How does it handle temporal data in relational databases?

Timestamp columns define causal ordering. When predicting an outcome at time t, the model only attends to rows with timestamps before t. This prevents data leakage automatically. Temporal encodings are added to capture recency and periodicity patterns.

Relational Graph Transformers: Attention for Relational Databases | Kumo.ai

A relational graph transformer is a graph transformer architecture specialized for data from relational databases. Enterprise data lives in relational databases: customers in one table, orders in another, products in a third, connected by foreign keys. A relational graph transformer converts this schema into a heterogeneous temporal graph and applies type-aware attention to learn cross-table patterns automatically.

This is the architecture behind KumoRFM, which achieves state-of-the-art results on the RelBench benchmark. It handles the three core challenges of enterprise relational data: heterogeneous types, temporal ordering, and massive scale.

From database to graph

The conversion from relational database to graph is mechanical:

Each table becomes a node type (customers, orders, products)
Each row becomes a node with features from its columns
Each foreign key becomes an edge type (customer placed order, order contains product)
Each timestamp column defines causal ordering for that node type

database_to_graph.py

# Conceptual: how a relational graph transformer sees your database

# customers table -> 'customer' node type
#   columns: customer_id, age, region, signup_date
#   -> node features: [age, region_embedding, days_since_signup]

# orders table -> 'order' node type
#   columns: order_id, customer_id, amount, order_date
#   -> node features: [amount_normalized, day_of_week, month]

# products table -> 'product' node type
#   columns: product_id, price, category, rating
#   -> node features: [price_normalized, category_embedding, rating]

# Foreign keys -> edge types
# orders.customer_id -> customers.customer_id
#   => ('order', 'placed_by', 'customer') edges
# order_items.product_id -> products.product_id
#   => ('order', 'contains', 'product') edges

# Causal constraint: order_date determines temporal ordering
# At prediction time t, only orders before t are visible

A relational graph transformer reads your database schema and builds the graph automatically. No manual graph construction.

Architecture components

Type-specific projections

Each node type has different columns with different semantics. Type-specific linear projections map each table's features into a shared hidden dimension:

Customer features (3 dims) projected to 128 dims
Order features (3 dims) projected to 128 dims
Product features (3 dims) projected to 128 dims

Relational attention

Attention heads are relation-aware. A customer attending to its orders uses different attention weights than a customer attending to its support tickets. Each edge type has learned query-key projections that capture relation-specific importance patterns.

Temporal encodings and causal masking

Timestamp columns are converted into temporal encodings (similar to positional encodings in language models). Causal masking ensures that when predicting at time t, attention weights are zero for nodes with timestamps after t. This prevents data leakage by construction.

RelBench results

The RelBench benchmark evaluates models on 30 prediction tasks across 7 real enterprise databases (e-commerce, healthcare, social networks, etc.) totaling 103 million rows:

LightGBM (flat table): 62.44 average AUROC. Requires manual feature engineering for each task.
GNN (message passing): 75.83 average AUROC. Automatic feature learning from graph structure.
KumoRFM (zero-shot): 76.71 average AUROC. No training on the target database at all.
KumoRFM (fine-tuned): 81.14 average AUROC. Fine-tuned on the target task.

The 30% relative improvement over LightGBM comes from automatically discovering cross-table patterns that manual feature engineering cannot anticipate.

From research to production

Building a relational graph transformer from scratch in PyG requires:

Converting your database schema into HeteroData
Implementing type-specific projections for each table
Adding temporal encodings and causal masking
Building the multi-head relational attention mechanism
Implementing neighbor sampling for scalable training
Setting up training loops with proper temporal train/test splits

KumoRFM packages all of this into a single API call. You point it at your database, write one line of PQL (Predictive Query Language), and get predictions. The relational graph transformer runs under the hood.

Key Takeaways

1A relational graph transformer converts relational databases into heterogeneous temporal graphs and applies type-aware attention. Tables become node types, foreign keys become edge types.
2It handles heterogeneous types (different columns per table), temporal ordering (causal masking from timestamps), and scale (neighbor sampling for millions of rows).
3On RelBench (7 databases, 30 tasks), it achieves 81.14 AUROC fine-tuned vs 62.44 for LightGBM. Much of the gain comes from automatic cross-table pattern discovery.
4This replaces months of manual feature engineering. SQL aggregations like 'average order amount in 30 days' are discovered automatically from the raw relational structure.
5KumoRFM is a pre-trained relational graph transformer. Zero-shot predictions on unseen databases (76.71 AUROC) without any training on the target data.

Relational Graph Transformers: Attention for Relational Databases