Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn15 min read

Foundation Models for Structured Data: The GPT Moment for Enterprise Databases

GPT changed text. Stable Diffusion changed images. The same shift is happening for structured data, and most people haven't noticed. A single pre-trained model that works on any relational database, any prediction task, without training.

TL;DR

  • 1Foundation models have transformed text, images, audio, and protein data. Structured relational data was the last frontier, and the gap has now closed.
  • 2LLMs on serialized tables score 68.06 AUROC on RelBench. The failure is architectural: text sequences destroy foreign keys, multi-table joins, and temporal ordering.
  • 3The breakthrough: representing any relational database as a temporal heterogeneous graph, a universal format enabling pre-training across thousands of schemas.
  • 4KumoRFM achieves 76.71 AUROC zero-shot, outperforming even task-specific GNNs (75.83) without any training on the target database. Fine-tuned: 81.14.
  • 5PluRel scaling laws show relational foundation models improve predictably with scale (exponent -0.38, comparable to GPT-3's -0.34), meaning current results are a lower bound.

Foundation models have transformed every major data modality. GPT-4 handles text. Stable Diffusion and DALL-E handle images. Whisper handles audio. AlphaFold handles protein structures. Each followed the same pattern: a large model pre-trained on diverse data that generalizes to new tasks without task-specific training.

One modality has been conspicuously absent from this revolution: structured data. The tables, rows, and columns that store 80% of enterprise data. The relational databases that run every bank, every retailer, every hospital. Until recently, if you wanted predictions from this data, you still had to flatten tables, engineer features, and train a model from scratch. Every single time.

That gap has closed. Relational foundation models now exist. They are pre-trained on billions of rows across thousands of databases and generalize to new relational databases zero-shot. The implications for enterprise ML are as large as GPT's implications for text.

Why the relational data gap existed

Foundation models require two things: a universal representation that works across different datasets, and enough training data in that representation to learn generalizable patterns.

For text, the representation is a token sequence. Every sentence, document, and book can be tokenized the same way. The entire internet provides training data. For images, the representation is a pixel grid. Every photo, painting, and screenshot uses the same format.

Structured data had neither property. Every relational database has a different schema: different tables, different columns, different data types, different relationships. A database of e-commerce transactions looks nothing like a database of clinical trials. There was no universal representation that could absorb both.

And there was no public pool of relational databases equivalent to Common Crawl for text or LAION for images. Enterprise databases are private by definition. You cannot scrape them from the internet.

These two problems, universal representation and data availability, are why structured data was the last frontier for foundation models. Both have now been solved.

The failed approaches

LLMs on tables: serialize and hope

The most obvious approach was to use existing LLMs. Serialize the table as JSON, CSV, or markdown, paste it into the prompt, and ask for predictions. Multiple research groups tried this systematically.

The results were disappointing. On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), Llama 3.2 3B achieved 68.06 AUROC on classification tasks. A supervised graph neural network on the same data achieved 75.83. KumoRFM achieved 76.71 zero-shot.

The failure is not about model size. It is architectural. LLMs process data as a sequence of tokens. When you serialize a table as text, you destroy the relational structure. Foreign key relationships become arbitrary text strings. Multi-table joins become impossible (they would exceed context windows). Temporal ordering becomes ambiguous. Numerical precision degrades through tokenization.

To see this concretely, here is how the e-commerce data below would look when serialized for an LLM versus represented as a graph.

what an LLM receives (serialized text)

rowserialized_input
1customer_id=C-801, name=Elena Vasquez, segment=Premium, ...
2order_id=ORD-5001, customer_id=C-801, total=$247, ...
3review_id=R-201, customer_id=C-801, product_id=PRD-44, rating=5, ...
4review_id=R-204, customer_id=C-803, product_id=PRD-44, rating=1, ...

The LLM sees 'customer_id=C-801' as a text token. It has no structural understanding that C-801 links to ORD-5001 via a foreign key, or that R-201 and R-204 both reference PRD-44. The relational graph is flattened into a string.

what a relational foundation model receives (graph)

nodetypeconnected_toedge_type
C-801 (Elena)CustomerORD-5001, ORD-5002, R-201placed, placed, wrote
ORD-5001OrderC-801, PRD-44placed_by, contains
PRD-44ProductR-201, R-202, R-204reviewed_by, reviewed_by, reviewed_by
R-204ReviewC-803, PRD-44written_by, about

The graph model sees that PRD-44 connects Elena (5-star review) to another customer C-803 (1-star review). The foreign key structure is preserved as edges. Multi-hop paths are traversable.

An LLM reading a serialized table is like a human reading a novel where every chapter is written in a different language, the chapters are shuffled, and the page numbers are missing. The information is technically present but the structure needed to interpret it is gone.

Single-table tabular foundation models

Several research groups built foundation models specifically for tabular data: TabPFN, CARTE, TabFM, and others. These models are pre-trained on diverse single-table datasets and can make predictions on new flat tables without training.

They work well within their scope. But their scope is limited to single flat tables. Enterprise data is not a single flat table. It is a relational database with 10 to 50 interconnected tables. Using a tabular foundation model on enterprise data still requires the same feature engineering step: flatten the relational database into a single table, then feed it to the model.

These models automate the modeling step on flat data. They do not address the 80% of work that is converting relational data to flat data.

What multi-table data actually looks like

To see why LLMs fail and relational foundation models succeed, consider a concrete e-commerce database. The signal that predicts customer behavior spans multiple tables simultaneously.

customers

customer_idnamesegmentsignup_dateregion
C-801Elena VasquezPremium2023-04-12West
C-802Tom FischerStandard2024-01-08Midwest
C-803Aisha PatelPremium2022-11-20Northeast

orders

order_idcustomer_idtotaldatechannel
ORD-5001C-801$247.002025-09-15Mobile App
ORD-5002C-801$89.502025-10-03Website
ORD-5003C-802$34.992025-10-28Mobile App
ORD-5004C-803$512.002025-08-20Website
ORD-5005C-803$78.002025-11-01Mobile App

reviews

review_idcustomer_idproduct_idratingdate
R-201C-801PRD-4452025-09-18
R-202C-802PRD-4422025-11-02
R-203C-803PRD-7142025-08-25
R-204C-803PRD-4412025-11-05

Highlighted: two customers gave product PRD-44 low ratings. A foundation model links these reviews to purchase patterns of other PRD-44 buyers, propagating the quality signal across the graph.

The breakthrough: relational data as graphs

The key insight came from recognizing that every relational database is a graph. Each row is a node. Each foreign key is an edge. Timestamps create temporal ordering. A database with 15 tables, 100 million rows, and 500 million foreign key relationships is a temporal heterogeneous graph with 100 million nodes and 500 million edges.

This representation is universal. It works regardless of the schema. An e-commerce database (customers, orders, products, reviews) and a clinical trial database (patients, visits, diagnoses, prescriptions) both become temporal heterogeneous graphs with the same mathematical structure. The node types and edge types differ, but the graph operations (message passing, attention, aggregation) are identical.

Relational Deep Learning, published at ICML 2024 by Robinson, Fey, et al. (Stanford and Kumo.ai), formalized this approach. They introduced RelBench as the standard benchmark and showed that graph neural networks trained on the relational graph outperform manual feature engineering across 30 tasks on 7 databases.

This solved the representation problem. Any relational database, any schema, becomes a graph that a single architecture can process.

KumoRFM: the first relational foundation model

KumoRFM is a graph transformer pre-trained on billions of rows across thousands of diverse relational databases. It is the first model that generalizes to new relational databases zero-shot, meaning it makes predictions on databases it has never seen during training.

Architecture

KumoRFM converts any input relational database into a temporal heterogeneous graph. It then applies a graph transformer with cross-table attention: each node can attend to nodes in other tables across foreign key edges, weighted by learned relevance. Temporal positional encodings ensure the model respects time ordering, so it does not leak future information into past predictions.

The architecture handles heterogeneous node types (each table has different columns), heterogeneous edge types (different foreign key relationships have different semantics), and temporal dynamics (the same entity at different points in time has different neighborhoods).

Pre-training

During pre-training, KumoRFM learns to predict masked node attributes and future events across thousands of databases. This is analogous to how GPT learns by predicting the next token. The model discovers universal patterns that recur across relational data: recency effects (recent events predict near-term outcomes), frequency patterns (activity levels correlate with engagement), temporal dynamics (accelerating or decelerating trends), graph topology (cluster structure predicts behavior), and cross-table propagation (attributes propagate through foreign key paths).

These patterns are not hard-coded. They are learned from the data, and they transfer across domains. Recency effects in e-commerce purchases follow the same mathematical pattern as recency effects in clinical trial visits or financial transactions.

Zero-shot inference

At inference time, you point KumoRFM at a new relational database and describe your prediction task in Predictive Query Language (PQL). The model converts the database to a graph, applies its pre-trained attention layers, and returns predictions. No training. No feature engineering. No pipeline.

PQL Query

PREDICT SUM(orders.total, 0, 90) > 0
FOR EACH customers.customer_id

Will this customer make a purchase in the next 90 days? The model reads customers, orders, and reviews as a graph. It discovers that C-803's declining review scores and lengthening order gaps signal disengagement.

Output

customer_idpurchase_probabilitytop_signal
C-8010.88Consistent order cadence, high review scores
C-8020.41Single purchase, negative review on key product
C-8030.29Declining review sentiment, widening order gaps

The numbers

On RelBench classification tasks (7 databases, 12 tasks):

ApproachAvg AUROCTraining required
LightGBM + manual features62.4412.3 hours + training per task
Llama 3.2 3B (serialized tables)68.06Prompt engineering per task
Supervised GNN75.83Training per task, no feature eng.
KumoRFM (zero-shot)76.71None
KumoRFM (fine-tuned)81.14Minimal fine-tuning

Two results stand out. First, KumoRFM zero-shot (no task-specific training whatsoever) outperforms a supervised GNN that was trained specifically for each task. Pre-training on diverse relational data produces better representations than training on any single database. Second, the LLM approach (Llama 3.2 3B) is 8.65 AUROC points below KumoRFM, confirming that text-based architectures are structurally wrong for relational data.

Current state: fragmented approaches

  • LLMs for text, CNNs for images, nothing for relational data
  • Every prediction task requires feature engineering from scratch
  • 80% of data science time spent on data preparation
  • Models trained from scratch for each database and task
  • Signal in multi-hop relationships goes undiscovered

Foundation model era for structured data

  • Single pre-trained model for any relational database
  • Zero-shot predictions without feature engineering
  • Multi-hop, temporal, and graph patterns captured automatically
  • PQL replaces months of pipeline work with one query
  • Performance improves predictably with model scale

PluRel scaling laws: why this gets better

One of the most important findings in the GPT research trajectory was scaling laws: the observation that language model performance improves predictably with model size, following a power law. This meant that investing in larger models was a reliable path to better performance, which justified the enormous compute investments of GPT-3 and GPT-4.

Kumo.ai researchers published PluRel (Power Laws for Unified Relational Learning), demonstrating that relational foundation models exhibit the same scaling behavior. The loss follows:

L(N) = 0.07 * N^(-0.38) + 0.36

Where N is the number of model parameters. The scaling exponent of -0.38 is comparable to the -0.34 observed for GPT-3 (Kaplan et al., 2020). This is not a coincidence. It suggests that relational data contains the same kind of deep, multi-scale structure that makes language data amenable to foundation model approaches.

The practical implication: doubling the model size produces a predictable improvement in accuracy. This means the current results are a lower bound. As compute and data scale increase, relational foundation models will continue improving along a known trajectory.

What this means for enterprise ML

The shift from task-specific models to foundation models in NLP took about 5 years (2018 GPT-1 to 2023 GPT-4 enterprise adoption). The structured data shift is following a compressed timeline because the playbook already exists.

For data science teams

The role of the data scientist shifts from pipeline builder to decision architect. Instead of spending months engineering features for one model, you spend days evaluating which prediction tasks create the most business value. PQL makes it possible to test a new prediction hypothesis in minutes. The bottleneck moves from "can we build this" to "should we build this."

For ML infrastructure

The feature store, training pipeline, model registry, and serving infrastructure that enterprise ML teams have built over the past decade were designed for the train-from-scratch paradigm. Foundation models collapse this stack. The database connects directly to the model. Predictions are served via API. The infrastructure overhead drops dramatically.

For business impact

DoorDash used this approach and saw a 1.8% engagement lift across 30 million users. Databricks saw a 5.4x conversion lift in lead scoring. Snowflake saw a 3.2x expansion revenue lift. These results came not from better models on the same data, but from models that access relational patterns invisible to flat-table approaches.

For competitive advantage

The organizations that adopt relational foundation models first gain a compounding advantage. While competitors spend months building one predictive model through traditional feature engineering, a foundation model approach lets you evaluate dozens of prediction tasks per week. You discover which predictions create the most value faster, deploy them faster, and iterate faster. Over time, this speed advantage compounds into a data moat: more predictions deployed means more feedback data, which means better fine-tuned models, which means more business impact.

The category is forming now

Foundation models for text went from "interesting research" (GPT-1, 2018) to "every enterprise needs this" (ChatGPT, 2022) in four years. Foundation models for structured data are at the beginning of this curve. The research is published (Relational Deep Learning at ICML 2024, RelBench at NeurIPS 2024, PluRel scaling laws). The benchmarks exist. The production deployments are generating real business results.

The companies that built GPT into their workflows early (not in 2024, but in 2020-2021 when it was still GPT-3) gained years of compound advantage. The same window exists now for relational foundation models. The data that runs your business is stored in relational databases. A model that reads those databases natively, without feature engineering, without training, is not an incremental improvement. It is a category shift.

Every prediction you want to make on enterprise data, from churn to fraud to demand to lifetime value, is a question about patterns in a relational graph. Foundation models are the first technology that can answer those questions directly.

Frequently asked questions

What is a foundation model for structured data?

A foundation model for structured data is a large neural network pre-trained on diverse relational databases that can generalize to new databases and prediction tasks without task-specific training. Analogous to how GPT was pre-trained on text and can answer questions on new topics, a relational foundation model like KumoRFM was pre-trained on billions of rows across thousands of databases and can make predictions on any relational database it has never seen before.

Why can't LLMs like GPT handle structured data well?

LLMs process data as text sequences, so tables must be serialized into text (JSON, CSV, markdown). This destroys the relational structure: foreign key relationships, multi-table joins, temporal ordering, and graph topology are all lost or distorted. On RelBench, Llama 3.2 3B achieved 68.06 AUROC on classification tasks, compared to 76.71 for KumoRFM, which operates on the relational graph structure natively. The gap is architectural, not a matter of model size.

What is KumoRFM?

KumoRFM is the first relational foundation model, developed by Kumo.ai. It is a graph transformer pre-trained on billions of rows across thousands of diverse relational databases. It represents any relational database as a temporal heterogeneous graph and uses attention mechanisms to learn universal predictive patterns. It delivers zero-shot predictions (no training required) on new databases via Predictive Query Language (PQL), achieving 76.71 AUROC on RelBench classification tasks.

What are PluRel scaling laws?

PluRel scaling laws, published by Kumo.ai researchers, demonstrate that relational foundation models exhibit power-law scaling similar to LLMs. Specifically, the loss follows L(N) = 0.07 * N^(-0.38) + 0.36, where N is the number of parameters. This means predictable performance improvements with increased model and data scale. The scaling exponent of -0.38 is comparable to the -0.34 observed for GPT-3, suggesting that relational data supports the same scale-driven gains as language data.

How does a relational foundation model differ from a tabular foundation model?

Tabular foundation models (TabPFN, CARTE, TabFM) operate on single flat tables. They cannot handle multi-table relational data, which is the native format of enterprise databases. Relational foundation models like KumoRFM represent the full multi-table database as a graph and learn patterns across tables, including multi-hop relationships and temporal sequences. This distinction matters because enterprise data is relational, not tabular.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.