Almost every enterprise prediction task is a multi-table problem. Will this customer churn? That depends on their order history (orders table), support interactions (tickets table), browsing behavior (sessions table), and the products they bought (products table). Will this transaction be fraud? That depends on the account (accounts table), the device (devices table), the counterparty (accounts table again), and the merchant (merchants table).
The prediction signal is spread across tables. Getting it into a model is the hardest part of enterprise ML.
The JOIN-and-aggregate bottleneck
Traditional ML requires a single training table: one row per prediction target with all features as columns. To use data from multiple tables, you must:
- JOIN: connect tables via foreign keys. One-to-many JOINs create row explosion (one customer row becomes 50 rows, one per order).
- Aggregate: collapse the explosion back to one row per target. COUNT, AVG, SUM, MAX over various groupings and time windows.
- Repeat: for every additional table, add more JOINs and aggregations. Each table interaction multiplies the complexity.
The number of possible features grows combinatorially with the number of tables. With 7 tables, 10 aggregation functions, and 3 time windows, you have thousands of possible features. A data scientist must decide which ones to compute. Most enterprise ML teams spend 2-6 months on this step.
The graph approach
With relational deep learning, multi-table prediction works differently:
- No JOINs: each table is a separate node type. Rows stay in their original tables.
- No aggregation: individual order rows, individual ticket rows, individual product rows remain as separate nodes. No information is compressed.
- Cross-table information flows through message passing: order nodes send messages to customer nodes. Product nodes send messages to order nodes. The GNN learns which information matters.
Example: churn prediction across 5 tables
Database: customers, orders, order_items, products, support_tickets.
- Layer 1: customer node aggregates messages from its orders (recency, amounts) and support tickets (count, severity). Order nodes aggregate messages from their order_items.
- Layer 2: customer node now sees product-level information through orders (which products, what categories, return rates). It also sees the resolution of support tickets (resolved, unresolved, escalated).
- Layer 3: customer node sees patterns of other customers who bought the same products. If those customers churned, this customer is at higher risk.
After 3 layers, the customer embedding encodes signals from all 5 tables without a single SQL JOIN. The GNN learned which cross-table patterns predict churn.
Why individual-level information matters
Consider two customers with the same average order amount of $50:
- Customer A: 10 orders, all $50 (consistent spender)
- Customer B: 10 orders, 9 at $10 and 1 at $410 (one-time splurge)
After aggregation, they look identical: avg_order_amount = $50. In the graph, they are clearly different: Customer A has 10 order nodes with similar amounts; Customer B has 9 low-value nodes and 1 high-value outlier. The GNN sees this distribution directly through message passing.
Benchmark results
On RelBench multi-table tasks (7 databases, 30 tasks, 103 million rows), the multi-table advantage is clear:
- Flat-table LightGBM (expert features): 62.44 average AUROC
- GNN on relational graph (no feature engineering): 75.83 average AUROC
- KumoRFM fine-tuned: 81.14 average AUROC
The 13+ point improvement comes almost entirely from preserving multi-table relationships that flat-table aggregation destroys.