Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Multi-Table Prediction: Predicting Outcomes That Depend on Data Across Many Tables

Enterprise predictions are inherently multi-table. Customer churn depends on orders, support tickets, product interactions, and session behavior, each in a different table. GNNs process all tables simultaneously through the relational graph.

PyTorch Geometric

TL;DR

  • 1Most enterprise predictions depend on data across 5-50 tables. Customer churn requires orders + support tickets + sessions + products. Fraud requires transactions + devices + counterparties + merchants.
  • 2Traditional ML flattens multi-table data into one row per target via JOINs and aggregations. This creates the feature engineering bottleneck: months of SQL, information loss through aggregation, and a fragile pipeline.
  • 3GNNs process multi-table data natively: each table is a node type, each FK is an edge type. Message passing propagates information across tables without JOINs or aggregations. Two layers = two table hops.
  • 4The information advantage: GNNs preserve individual-row-level information across all tables. Traditional ML must aggregate (avg, count, max), losing distributions, trends, and combinatorial patterns.
  • 5On RelBench multi-table tasks, GNNs improve AUROC by 13+ points over flat-table models. The gap grows with the number of tables because more relational structure means more patterns to discover.

Almost every enterprise prediction task is a multi-table problem. Will this customer churn? That depends on their order history (orders table), support interactions (tickets table), browsing behavior (sessions table), and the products they bought (products table). Will this transaction be fraud? That depends on the account (accounts table), the device (devices table), the counterparty (accounts table again), and the merchant (merchants table).

The prediction signal is spread across tables. Getting it into a model is the hardest part of enterprise ML.

The JOIN-and-aggregate bottleneck

Traditional ML requires a single training table: one row per prediction target with all features as columns. To use data from multiple tables, you must:

  1. JOIN: connect tables via foreign keys. One-to-many JOINs create row explosion (one customer row becomes 50 rows, one per order).
  2. Aggregate: collapse the explosion back to one row per target. COUNT, AVG, SUM, MAX over various groupings and time windows.
  3. Repeat: for every additional table, add more JOINs and aggregations. Each table interaction multiplies the complexity.

The number of possible features grows combinatorially with the number of tables. With 7 tables, 10 aggregation functions, and 3 time windows, you have thousands of possible features. A data scientist must decide which ones to compute. Most enterprise ML teams spend 2-6 months on this step.

The graph approach

With relational deep learning, multi-table prediction works differently:

  • No JOINs: each table is a separate node type. Rows stay in their original tables.
  • No aggregation: individual order rows, individual ticket rows, individual product rows remain as separate nodes. No information is compressed.
  • Cross-table information flows through message passing: order nodes send messages to customer nodes. Product nodes send messages to order nodes. The GNN learns which information matters.

Example: churn prediction across 5 tables

Database: customers, orders, order_items, products, support_tickets.

  • Layer 1: customer node aggregates messages from its orders (recency, amounts) and support tickets (count, severity). Order nodes aggregate messages from their order_items.
  • Layer 2: customer node now sees product-level information through orders (which products, what categories, return rates). It also sees the resolution of support tickets (resolved, unresolved, escalated).
  • Layer 3: customer node sees patterns of other customers who bought the same products. If those customers churned, this customer is at higher risk.

After 3 layers, the customer embedding encodes signals from all 5 tables without a single SQL JOIN. The GNN learned which cross-table patterns predict churn.

Why individual-level information matters

Consider two customers with the same average order amount of $50:

  • Customer A: 10 orders, all $50 (consistent spender)
  • Customer B: 10 orders, 9 at $10 and 1 at $410 (one-time splurge)

After aggregation, they look identical: avg_order_amount = $50. In the graph, they are clearly different: Customer A has 10 order nodes with similar amounts; Customer B has 9 low-value nodes and 1 high-value outlier. The GNN sees this distribution directly through message passing.

Benchmark results

On RelBench multi-table tasks (7 databases, 30 tasks, 103 million rows), the multi-table advantage is clear:

  • Flat-table LightGBM (expert features): 62.44 average AUROC
  • GNN on relational graph (no feature engineering): 75.83 average AUROC
  • KumoRFM fine-tuned: 81.14 average AUROC

The 13+ point improvement comes almost entirely from preserving multi-table relationships that flat-table aggregation destroys.

Frequently asked questions

What is multi-table prediction?

Multi-table prediction is any ML task where the target depends on data spread across multiple database tables. Customer churn depends on order history, support tickets, product interactions, and session behavior. Fraud detection depends on transaction patterns, device information, counterparty behavior, and merchant history. These signals live in different tables connected by foreign keys.

Why is multi-table prediction hard for traditional ML?

Traditional ML requires a single flat feature table. To use data from multiple tables, you must join them (which can create row explosion for one-to-many relationships), aggregate (which loses information), and engineer features manually for every cross-table relationship. Each new table added to the model requires new JOINs, new aggregations, and re-validation of the entire feature pipeline.

How do GNNs handle multi-table prediction?

GNNs represent the multi-table database as a heterogeneous graph (tables = node types, foreign keys = edges) and use message passing to propagate information across tables. Each layer of message passing lets nodes absorb information from connected nodes in other tables. After 2-3 layers, a customer node's embedding incorporates signals from orders, products, support tickets, and sessions without any manual JOIN or aggregation.

How many tables can GNNs handle?

There is no inherent limit. The number of tables determines the number of node types and edge types in the heterogeneous graph. In practice, enterprise databases with 10-50 tables work well. The graph becomes richer (more paths for message passing to explore), which generally improves prediction quality. The computational cost scales with the number of edges (FK references), not the number of tables.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.