Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn13 min read

Foundation Models vs Traditional ML: What Changes and What Doesn't

Foundation models are reshaping how ML works on structured data. But the shift is not universal. Here's an honest comparison: where foundation models win, where traditional ML still holds, and how to decide.

TL;DR

  • 1On RelBench (7 databases, 30 tasks, 103M+ rows): LightGBM 62.44, LLM (Llama 3.2 3B) 68.06, supervised GNN 75.83, KumoRFM zero-shot 76.71, KumoRFM fine-tuned 81.14 AUROC. The 14-point gap is information loss from flattening.
  • 2Traditional ML remains competitive for single-table problems with mature feature stores and strict regulatory interpretability requirements. It is not dead, but the set of problems where it wins is shrinking.
  • 3The economics change fundamentally. Instead of a 2-month project per prediction task (12.3 hours, 878 lines of code), you get answers in seconds. A team building 4 models per quarter can explore 40 questions in a day.
  • 4Data quality, problem framing, evaluation, and deployment still require human judgment. Foundation models eliminate the feature engineering bottleneck (80% of data science time), not the need for data scientists.
  • 5Fine-tuning bridges the last accuracy gap: 81.14 AUROC, 18.7 points above LightGBM with manual features. For high-value tasks, fine-tuning adds domain-specific signal on top of pre-trained relational knowledge.

Every few years, a new paradigm arrives and the discourse swings to extremes. LLMs will replace everything. Traditional ML is dead. The truth is always more specific.

Foundation models for structured data are real and they are good. KumoRFM zero-shot outperforms manually-engineered LightGBM on the RelBench benchmark by 14 AUROC points. But traditional ML is not dead. There are cases where a well-tuned XGBoost model on a carefully engineered feature table is the right answer.

The question is not which is better in the abstract. It is which is better for your specific data, team, and problem. This article walks through the actual differences with real numbers so you can make that call.

relbench_benchmark_results

approachAUROC (classification)time_per_taskfeature_engineering
LightGBM + manual features62.4412.3 hours878 lines of code
LLM (Llama 3.2 3B)68.06MinutesNone (text serialization)
Supervised GNN (RDL)75.83~30 min trainingNone
KumoRFM zero-shot76.71<1 secondNone
KumoRFM fine-tuned81.14Minutes fine-tuningNone

Highlighted: KumoRFM zero-shot outperforms all approaches without any task-specific training. Fine-tuning adds another 4.4 AUROC points. The 14-point gap between LightGBM and KumoRFM is the cost of information loss during flattening.

when_to_use_which

scenariobest_approachwhy
Single flat table, mature featuresTraditional MLNo multi-table structure to exploit
Multi-table, one high-value taskRDL (train GNN)Worth the training investment
Multi-table, many tasksFoundation modelZero-shot across all tasks
Regulatory interpretability requiredTraditional MLLinear models are easier to audit
Speed to first prediction criticalFoundation modelSeconds vs weeks
Rapid prototyping / explorationFoundation modelTest 40 questions in a day

The choice depends on your data structure, number of tasks, and time constraints. Foundation models have the largest advantage when data spans multiple tables and you need predictions fast.

What traditional ML does well

Traditional ML (gradient boosted trees, logistic regression, random forests) has earned its place. For the right problem, it is fast, interpretable, and well-understood.

Single-table problems

If your prediction task lives in a single table with clean, well-understood features, traditional ML is hard to beat on efficiency. A credit scoring model built on a flat table of 50 pre-computed features does not benefit from multi-table graph learning because there is no multi-table structure to exploit.

In Kaggle competitions on single-table datasets, gradient boosted trees win most tabular benchmarks. This is not surprising. These models were specifically designed for flat tabular data, and they have had 20 years of optimization for that format.

Well-engineered feature sets

If your company has invested years building a mature feature store with hundreds of carefully curated features, those features encode valuable domain knowledge. A traditional model trained on these features benefits from both statistical learning and human insight.

The challenge is the upfront cost: that feature store took years to build, and it is expensive to maintain. But if it already exists, the marginal cost of training another model on it is low.

Regulatory interpretability

In regulated industries like banking and insurance, model interpretability is sometimes a legal requirement. A logistic regression or small decision tree with 20 features is easy to explain. Every feature has a known coefficient or split point. Regulators can audit it. Customers can receive explanations.

Foundation models are not black boxes (feature importance and attention weights can be extracted), but the interpretability story is less mature than for linear models.

What foundation models change

Foundation models for relational data change three things fundamentally: how data is consumed, how long predictions take, and how many tasks a single model can handle.

1. Multi-table data without flattening

Enterprise databases are relational. Customers connect to orders connect to products connect to reviews connect to other customers. Traditional ML requires flattening this into a single table, which means choosing which tables to join, which columns to aggregate, and which time windows to apply. A Stanford study measured this at 12.3 hours and 878 lines of code per task.

relational_tables (e-commerce database)

tablerowsexample_columnsforeign_keys
customers500Kcustomer_id, segment, signup_date
orders12Morder_id, customer_id, total, datecustomer_id
products80Kproduct_id, category, brand, price
order_items35Mitem_id, order_id, product_id, qtyorder_id, product_id
reviews4Mreview_id, customer_id, product_id, starscustomer_id, product_id
support1.2Mticket_id, customer_id, category, resolved_hrscustomer_id

Six tables, 52M+ rows, multiple join paths. To predict churn, traditional ML must flatten this into one row per customer. The foundation model reads all six tables directly.

flat_feature_table (what traditional ML produces from 6 tables)

customer_idorders_30davg_order_valcategories_boughtavg_review_starstickets_30ddays_since_order
C-1014$67.3054.203
C-1020$00374
C-1036$45.8083.811

52 million rows across 6 tables compressed into 7 columns per customer. The multi-hop pattern 'C-102 gave 1-star reviews to products that other high-value customers also reviewed poorly' is gone. The temporal escalation of C-102's support tickets is gone. Only aggregates survive.

Foundation models skip this step entirely. KumoRFM represents the database as a temporal heterogeneous graph and learns patterns directly from the raw relational structure. No flattening, no feature engineering, no information loss.

The impact on accuracy is significant. On RelBench classification tasks, LightGBM with manual features scores 62.44 AUROC. KumoRFM zero-shot scores 76.71. The gap is not due to a better model architecture. It is due to the model seeing the full relational structure instead of a lossy summary.

2. Zero-shot prediction

Traditional ML requires training a new model for every prediction task. Want to predict churn? Train a model. Upsell? Train another. Fraud? Another. Each model needs its own feature engineering, data pipeline, training run, and deployment.

A foundation model is pre-trained. It has learned universal patterns (recency, frequency, temporal dynamics, graph topology) from thousands of diverse databases. At inference time, you point it at your data and ask a question. No training required.

This changes the economics of ML. Instead of a 2-month project per prediction task, you get answers in seconds. A team that could build 4 models per quarter can now explore 40 prediction questions in a day.

3. Cross-task generalization

Traditional ML models are narrow. A churn model knows nothing about fraud. A recommendation model knows nothing about demand forecasting. Each model starts from scratch.

Foundation models transfer knowledge across tasks. The patterns that predict churn (declining engagement, increasing support load, behavioral shifts) are structurally similar to the patterns that predict other outcomes. A model that has seen these patterns across thousands of databases recognizes them in yours immediately.

Traditional ML

  • Requires flat feature table as input
  • 12.3 hours of feature engineering per task
  • One model per prediction task
  • Strong on single-table, well-featured data
  • Mature interpretability tooling

Foundation model (KumoRFM)

  • Reads raw relational tables directly
  • Zero feature engineering required
  • One model handles any prediction task
  • 14+ AUROC points better on multi-table data
  • Seconds to first prediction, not weeks

The benchmark evidence

RelBench is the standard benchmark for ML on relational data. It includes 7 databases, 30 prediction tasks, and over 103 million rows. The results tell a clear story:

ApproachAUROC (classification)Time per task
LightGBM + manual features62.4412.3 hours
LLM (Llama 3.2 3B)68.06Minutes (but poor accuracy)
Supervised GNN (RDL)75.83~30 minutes training
KumoRFM zero-shot76.71<1 second
KumoRFM fine-tuned81.14Minutes of fine-tuning

Two things stand out. First, the gap between manual features (62.44) and the graph-based approaches (75-81) is large. This is the cost of information loss during flattening. Second, KumoRFM zero-shot (no task-specific training) already outperforms a supervised GNN that was trained specifically for each task.

PQL Query

PREDICT upsell_probability
FOR EACH customers.customer_id
WHERE customers.plan = 'basic'

With a foundation model, a new prediction task is a new query, not a new project. The same model that predicts churn also predicts upsell, fraud, demand, and any other relational question. No feature engineering, no retraining.

Output

customer_idupsell_probrecommended_plantop_signal
C-88010.89EnterpriseUsage exceeds plan limits 3x/week
C-88020.72ProTeam size grew from 3 to 8 in 60 days
C-88030.31ProModerate usage, no growth signals
C-88040.14Stay BasicLow engagement, price-sensitive segment

What stays the same

Foundation models do not change everything. Several fundamentals of ML remain exactly the same.

Data quality still matters

A foundation model that reads garbage tables produces garbage predictions. Missing values, incorrect timestamps, broken foreign keys, duplicated rows, and stale data all degrade performance. The model is better at handling messy data than manual feature engineering (which often breaks on edge cases), but it is not magic.

Problem framing still matters

Choosing the right prediction target, the right entity, the right time horizon, and the right evaluation metric still requires human judgment. A foundation model can predict churn in 1 second, but deciding whether to predict 30-day churn or 90-day churn, and what to do about the predictions, is still a business decision.

Deployment still matters

Getting predictions into production systems, monitoring model performance, handling drift, and integrating with downstream workflows are engineering problems that exist regardless of how the model was built.

Evaluation still matters

You still need holdout sets, proper temporal splits, and meaningful metrics. A foundation model makes it faster to get predictions, but you still need to verify they are good before acting on them.

Decision framework

Here is a practical framework for choosing between the two approaches.

Use traditional ML when

  • Your data lives in a single table with pre-computed features
  • You have a mature feature store that already covers this use case
  • Strict regulatory interpretability requirements apply (linear models only)
  • The prediction task is simple, well-understood, and has been stable for years

Use a foundation model when

  • Your data spans multiple relational tables
  • You need predictions fast (days, not months)
  • You have many prediction tasks and cannot afford to build custom pipelines for each one
  • Your data science team spends most of their time on feature engineering instead of business problems
  • You suspect there are predictive signals in cross-table relationships that your current features do not capture

Use both when

  • You have existing high-value models in production that work well, and you want to use foundation models for new prediction tasks or rapid prototyping
  • You want to benchmark your current approach against a foundation model to quantify the accuracy gap
  • You are migrating incrementally, keeping proven models while building new ones with the foundation model approach

The real shift

The shift from traditional ML to foundation models for structured data is not about better algorithms. It is about removing a structural bottleneck.

For two decades, the ML pipeline has required converting relational data into flat tables. This conversion is lossy (multi-hop patterns disappear), slow (12.3 hours per task), and scales linearly with the number of prediction tasks. Every new question means another round of feature engineering.

Foundation models remove that conversion step. They read relational data directly. The result is faster predictions, higher accuracy on multi-table data, and a fundamentally different cost structure where the marginal cost of a new prediction task approaches zero.

Traditional ML is not dead. But the set of problems where it is the best answer is shrinking. If your data is relational and your bottleneck is feature engineering, the foundation model approach is not just faster. It is better.

Frequently asked questions

Are foundation models always better than traditional ML for structured data?

No. For single-table problems with well-engineered features, gradient boosted trees (XGBoost, LightGBM) remain competitive and sometimes win. Foundation models have the largest advantage on multi-table relational data where feature engineering is the bottleneck. On the RelBench benchmark, KumoRFM zero-shot outperforms LightGBM with manual features by 14 AUROC points on average across classification tasks.

Do foundation models for structured data work like LLMs?

They share the same conceptual idea (pre-train on large data, then apply to specific tasks), but the architecture is fundamentally different. LLMs use transformers trained on token sequences. Relational foundation models like KumoRFM use graph transformers trained on temporal heterogeneous graphs derived from relational databases. The training objective is relational pattern learning, not next-token prediction.

Can I fine-tune a foundation model on my own data?

Yes. KumoRFM supports fine-tuning on your specific database, which pushes accuracy from 76.71 to 81.14 AUROC on the RelBench benchmark. Fine-tuning lets the model adapt its pre-trained relational knowledge to the specific patterns in your data, similar to how fine-tuning GPT on domain text improves performance.

What is the main advantage of foundation models over traditional ML?

Speed to first prediction. Traditional ML requires weeks of feature engineering per task. A foundation model delivers predictions from raw relational data in seconds, zero-shot, with no feature engineering or model training. For enterprises with dozens of prediction tasks, this compresses months of work into hours.

Will foundation models replace data scientists?

No. They replace the most tedious part of a data scientist's work: feature engineering, which consumes 80% of their time. Data scientists are still needed for problem framing, data quality, result interpretation, deployment strategy, and the business judgment that turns predictions into decisions. Foundation models free them to spend time on these higher-value activities.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.