What is multi-table prediction?

Multi-table prediction is any ML task where the target depends on data spread across multiple database tables. Customer churn depends on order history, support tickets, product interactions, and session behavior. Fraud detection depends on transaction patterns, device information, counterparty behavior, and merchant history. These signals live in different tables connected by foreign keys.

Why is multi-table prediction hard for traditional ML?

Traditional ML requires a single flat feature table. To use data from multiple tables, you must join them (which can create row explosion for one-to-many relationships), aggregate (which loses information), and engineer features manually for every cross-table relationship. Each new table added to the model requires new JOINs, new aggregations, and re-validation of the entire feature pipeline.

How do GNNs handle multi-table prediction?

GNNs represent the multi-table database as a heterogeneous graph (tables = node types, foreign keys = edges) and use message passing to propagate information across tables. Each layer of message passing lets nodes absorb information from connected nodes in other tables. After 2-3 layers, a customer node's embedding incorporates signals from orders, products, support tickets, and sessions without any manual JOIN or aggregation.

How many tables can GNNs handle?

There is no inherent limit. The number of tables determines the number of node types and edge types in the heterogeneous graph. In practice, enterprise databases with 10-50 tables work well. The graph becomes richer (more paths for message passing to explore), which generally improves prediction quality. The computational cost scales with the number of edges (FK references), not the number of tables.

Multi-Table Prediction: Outcomes That Depend on Data Across Many Tables | Kumo.ai

Almost every enterprise prediction task is a multi-table problem. Will this customer churn? That depends on their order history (orders table), support interactions (tickets table), browsing behavior (sessions table), and the products they bought (products table). Will this transaction be fraud? That depends on the account (accounts table), the device (devices table), the counterparty (accounts table again), and the merchant (merchants table).

The prediction signal is spread across tables. Getting it into a model is the hardest part of enterprise ML.

The JOIN-and-aggregate bottleneck

Traditional ML requires a single training table: one row per prediction target with all features as columns. To use data from multiple tables, you must:

JOIN: connect tables via foreign keys. One-to-many JOINs create row explosion (one customer row becomes 50 rows, one per order).
Aggregate: collapse the explosion back to one row per target. COUNT, AVG, SUM, MAX over various groupings and time windows.
Repeat: for every additional table, add more JOINs and aggregations. Each table interaction multiplies the complexity.

The number of possible features grows combinatorially with the number of tables. With 7 tables, 10 aggregation functions, and 3 time windows, you have thousands of possible features. A data scientist must decide which ones to compute. Most enterprise ML teams spend 2-6 months on this step.

The graph approach

With relational deep learning, multi-table prediction works differently:

No JOINs: each table is a separate node type. Rows stay in their original tables.
No aggregation: individual order rows, individual ticket rows, individual product rows remain as separate nodes. No information is compressed.
Cross-table information flows through message passing: order nodes send messages to customer nodes. Product nodes send messages to order nodes. The GNN learns which information matters.

Example: churn prediction across 5 tables

Database: customers, orders, order_items, products, support_tickets.

Layer 1: customer node aggregates messages from its orders (recency, amounts) and support tickets (count, severity). Order nodes aggregate messages from their order_items.
Layer 2: customer node now sees product-level information through orders (which products, what categories, return rates). It also sees the resolution of support tickets (resolved, unresolved, escalated).
Layer 3: customer node sees patterns of other customers who bought the same products. If those customers churned, this customer is at higher risk.

After 3 layers, the customer embedding encodes signals from all 5 tables without a single SQL JOIN. The GNN learned which cross-table patterns predict churn.

Why individual-level information matters

Consider two customers with the same average order amount of $50:

Customer A: 10 orders, all $50 (consistent spender)
Customer B: 10 orders, 9 at $10 and 1 at $410 (one-time splurge)

After aggregation, they look identical: avg_order_amount = $50. In the graph, they are clearly different: Customer A has 10 order nodes with similar amounts; Customer B has 9 low-value nodes and 1 high-value outlier. The GNN sees this distribution directly through message passing.

Benchmark results

On RelBench multi-table tasks (7 databases, 30 tasks, 103 million rows), the multi-table advantage is clear:

Flat-table LightGBM (expert features): 62.44 average AUROC
GNN on relational graph (no feature engineering): 75.83 average AUROC
KumoRFM fine-tuned: 81.14 average AUROC

The 13+ point improvement comes almost entirely from preserving multi-table relationships that flat-table aggregation destroys.

Key Takeaways

1Enterprise predictions are inherently multi-table: churn depends on orders + tickets + sessions + products; fraud depends on transactions + devices + counterparties + merchants. The signal is spread across tables.
2Traditional ML requires JOINs (row explosion) and aggregation (information loss) to flatten multi-table data into a single training table. This takes months and loses distributional, temporal, and combinatorial patterns.
3GNNs process multi-table data natively: each table is a node type, each FK is an edge. Message passing propagates information across tables. No JOINs, no aggregation, no information loss.
4Individual-row preservation matters: two customers with the same average order amount can have entirely different order distributions. Aggregation hides this; graph representation preserves it.
5The GNN advantage grows with relational complexity: more tables mean more cross-table patterns to discover. Adding a new table to a GNN requires zero feature engineering; it automatically becomes part of the graph.

Multi-Table Prediction: Predicting Outcomes That Depend on Data Across Many Tables