A relational graph transformer is a graph transformer architecture specialized for data from relational databases. Enterprise data lives in relational databases: customers in one table, orders in another, products in a third, connected by foreign keys. A relational graph transformer converts this schema into a heterogeneous temporal graph and applies type-aware attention to learn cross-table patterns automatically.
This is the architecture behind KumoRFM, which achieves state-of-the-art results on the RelBench benchmark. It handles the three core challenges of enterprise relational data: heterogeneous types, temporal ordering, and massive scale.
From database to graph
The conversion from relational database to graph is mechanical:
- Each table becomes a node type (customers, orders, products)
- Each row becomes a node with features from its columns
- Each foreign key becomes an edge type (customer placed order, order contains product)
- Each timestamp column defines causal ordering for that node type
# Conceptual: how a relational graph transformer sees your database
# customers table -> 'customer' node type
# columns: customer_id, age, region, signup_date
# -> node features: [age, region_embedding, days_since_signup]
# orders table -> 'order' node type
# columns: order_id, customer_id, amount, order_date
# -> node features: [amount_normalized, day_of_week, month]
# products table -> 'product' node type
# columns: product_id, price, category, rating
# -> node features: [price_normalized, category_embedding, rating]
# Foreign keys -> edge types
# orders.customer_id -> customers.customer_id
# => ('order', 'placed_by', 'customer') edges
# order_items.product_id -> products.product_id
# => ('order', 'contains', 'product') edges
# Causal constraint: order_date determines temporal ordering
# At prediction time t, only orders before t are visibleA relational graph transformer reads your database schema and builds the graph automatically. No manual graph construction.
Architecture components
Type-specific projections
Each node type has different columns with different semantics. Type-specific linear projections map each table's features into a shared hidden dimension:
- Customer features (3 dims) projected to 128 dims
- Order features (3 dims) projected to 128 dims
- Product features (3 dims) projected to 128 dims
Relational attention
Attention heads are relation-aware. A customer attending to its orders uses different attention weights than a customer attending to its support tickets. Each edge type has learned query-key projections that capture relation-specific importance patterns.
Temporal encodings and causal masking
Timestamp columns are converted into temporal encodings (similar to positional encodings in language models). Causal masking ensures that when predicting at time t, attention weights are zero for nodes with timestamps after t. This prevents data leakage by construction.
RelBench results
The RelBench benchmark evaluates models on 30 prediction tasks across 7 real enterprise databases (e-commerce, healthcare, social networks, etc.) totaling 103 million rows:
- LightGBM (flat table): 62.44 average AUROC. Requires manual feature engineering for each task.
- GNN (message passing): 75.83 average AUROC. Automatic feature learning from graph structure.
- KumoRFM (zero-shot): 76.71 average AUROC. No training on the target database at all.
- KumoRFM (fine-tuned): 81.14 average AUROC. Fine-tuned on the target task.
The 30% relative improvement over LightGBM comes from automatically discovering cross-table patterns that manual feature engineering cannot anticipate.
From research to production
Building a relational graph transformer from scratch in PyG requires:
- Converting your database schema into HeteroData
- Implementing type-specific projections for each table
- Adding temporal encodings and causal masking
- Building the multi-head relational attention mechanism
- Implementing neighbor sampling for scalable training
- Setting up training loops with proper temporal train/test splits
KumoRFM packages all of this into a single API call. You point it at your database, write one line of PQL (Predictive Query Language), and get predictions. The relational graph transformer runs under the hood.