Are foundation models always better than traditional ML for structured data?

No. For single-table problems with well-engineered features, gradient boosted trees (XGBoost, LightGBM) remain competitive and sometimes win. Foundation models have the largest advantage on multi-table relational data where feature engineering is the bottleneck. On the RelBench benchmark, KumoRFM zero-shot outperforms LightGBM with manual features by 14 AUROC points on average across classification tasks.

Do foundation models for structured data work like LLMs?

They share the same conceptual idea (pre-train on large data, then apply to specific tasks), but the architecture is fundamentally different. LLMs use transformers trained on token sequences. Relational foundation models like KumoRFM use graph transformers trained on temporal heterogeneous graphs derived from relational databases. The training objective is relational pattern learning, not next-token prediction.

Can I fine-tune a foundation model on my own data?

Yes. KumoRFM supports fine-tuning on your specific database, which pushes accuracy from 76.71 to 81.14 AUROC on the RelBench benchmark. Fine-tuning lets the model adapt its pre-trained relational knowledge to the specific patterns in your data, similar to how fine-tuning GPT on domain text improves performance.

What is the main advantage of foundation models over traditional ML?

Speed to first prediction. Traditional ML requires weeks of feature engineering per task. A foundation model delivers predictions from raw relational data in seconds, zero-shot, with no feature engineering or model training. For enterprises with dozens of prediction tasks, this compresses months of work into hours.

Will foundation models replace data scientists?

No. They replace the most tedious part of a data scientist's work: feature engineering, which consumes 80% of their time. Data scientists are still needed for problem framing, data quality, result interpretation, deployment strategy, and the business judgment that turns predictions into decisions. Foundation models free them to spend time on these higher-value activities.

Foundation Models vs Traditional ML: What Changes and What Doesn't | Kumo.ai

Every few years, a new paradigm arrives and the discourse swings to extremes. LLMs will replace everything. Traditional ML is dead. The truth is always more specific.

Foundation models for structured data are real and they are good. KumoRFM zero-shot outperforms manually-engineered LightGBM on the RelBench benchmark by 14 AUROC points. But traditional ML is not dead. There are cases where a well-tuned XGBoost model on a carefully engineered feature table is the right answer.

The question is not which is better in the abstract. It is which is better for your specific data, team, and problem. This article walks through the actual differences with real numbers so you can make that call.

relbench_benchmark_results

approach	AUROC (classification)	time_per_task	feature_engineering
LightGBM + manual features	62.44	12.3 hours	878 lines of code
LLM (Llama 3.2 3B)	68.06	Minutes	None (text serialization)
Supervised GNN (RDL)	75.83	~30 min training	None
KumoRFM zero-shot	76.71	<1 second	None
KumoRFM fine-tuned	81.14	Minutes fine-tuning	None

Highlighted: KumoRFM zero-shot outperforms all approaches without any task-specific training. Fine-tuning adds another 4.4 AUROC points. The 14-point gap between LightGBM and KumoRFM is the cost of information loss during flattening.

when_to_use_which

scenario	best_approach	why
Single flat table, mature features	Traditional ML	No multi-table structure to exploit
Multi-table, one high-value task	RDL (train GNN)	Worth the training investment
Multi-table, many tasks	Foundation model	Zero-shot across all tasks
Regulatory interpretability required	Traditional ML	Linear models are easier to audit
Speed to first prediction critical	Foundation model	Seconds vs weeks
Rapid prototyping / exploration	Foundation model	Test 40 questions in a day

The choice depends on your data structure, number of tasks, and time constraints. Foundation models have the largest advantage when data spans multiple tables and you need predictions fast.

What traditional ML does well

Traditional ML (gradient boosted trees, logistic regression, random forests) has earned its place. For the right problem, it is fast, interpretable, and well-understood.

Single-table problems

If your prediction task lives in a single table with clean, well-understood features, traditional ML is hard to beat on efficiency. A credit scoring model built on a flat table of 50 pre-computed features does not benefit from multi-table graph learning because there is no multi-table structure to exploit.

In Kaggle competitions on single-table datasets, gradient boosted trees win most tabular benchmarks. This is not surprising. These models were specifically designed for flat tabular data, and they have had 20 years of optimization for that format.

Well-engineered feature sets

If your company has invested years building a mature feature store with hundreds of carefully curated features, those features encode valuable domain knowledge. A traditional model trained on these features benefits from both statistical learning and human insight.

The challenge is the upfront cost: that feature store took years to build, and it is expensive to maintain. But if it already exists, the marginal cost of training another model on it is low.

Regulatory interpretability

In regulated industries like banking and insurance, model interpretability is sometimes a legal requirement. A logistic regression or small decision tree with 20 features is easy to explain. Every feature has a known coefficient or split point. Regulators can audit it. Customers can receive explanations.

Foundation models are not black boxes (feature importance and attention weights can be extracted), but the interpretability story is less mature than for linear models.

What foundation models change

Foundation models for relational data change three things fundamentally: how data is consumed, how long predictions take, and how many tasks a single model can handle.

1. Multi-table data without flattening

Enterprise databases are relational. Customers connect to orders connect to products connect to reviews connect to other customers. Traditional ML requires flattening this into a single table, which means choosing which tables to join, which columns to aggregate, and which time windows to apply. A Stanford study measured this at 12.3 hours and 878 lines of code per task.

relational_tables (e-commerce database)

table	rows	example_columns	foreign_keys
customers	500K	customer_id, segment, signup_date	—
orders	12M	order_id, customer_id, total, date	customer_id
products	80K	product_id, category, brand, price	—
order_items	35M	item_id, order_id, product_id, qty	order_id, product_id
reviews	4M	review_id, customer_id, product_id, stars	customer_id, product_id
support	1.2M	ticket_id, customer_id, category, resolved_hrs	customer_id

Six tables, 52M+ rows, multiple join paths. To predict churn, traditional ML must flatten this into one row per customer. The foundation model reads all six tables directly.

flat_feature_table (what traditional ML produces from 6 tables)

customer_id	orders_30d	avg_order_val	categories_bought	avg_review_stars	tickets_30d	days_since_order
C-101	4	$67.30	5	4.2	0	3
C-102	0	$0	0	—	3	74
C-103	6	$45.80	8	3.8	1	1

52 million rows across 6 tables compressed into 7 columns per customer. The multi-hop pattern 'C-102 gave 1-star reviews to products that other high-value customers also reviewed poorly' is gone. The temporal escalation of C-102's support tickets is gone. Only aggregates survive.

Foundation models skip this step entirely. KumoRFM represents the database as a temporal heterogeneous graph and learns patterns directly from the raw relational structure. No flattening, no feature engineering, no information loss.

The impact on accuracy is significant. On RelBench classification tasks, LightGBM with manual features scores 62.44 AUROC. KumoRFM zero-shot scores 76.71. The gap is not due to a better model architecture. It is due to the model seeing the full relational structure instead of a lossy summary.

2. Zero-shot prediction

Traditional ML requires training a new model for every prediction task. Want to predict churn? Train a model. Upsell? Train another. Fraud? Another. Each model needs its own feature engineering, data pipeline, training run, and deployment.

A foundation model is pre-trained. It has learned universal patterns (recency, frequency, temporal dynamics, graph topology) from thousands of diverse databases. At inference time, you point it at your data and ask a question. No training required.

This changes the economics of ML. Instead of a 2-month project per prediction task, you get answers in seconds. A team that could build 4 models per quarter can now explore 40 prediction questions in a day.

3. Cross-task generalization

Traditional ML models are narrow. A churn model knows nothing about fraud. A recommendation model knows nothing about demand forecasting. Each model starts from scratch.

Foundation models transfer knowledge across tasks. The patterns that predict churn (declining engagement, increasing support load, behavioral shifts) are structurally similar to the patterns that predict other outcomes. A model that has seen these patterns across thousands of databases recognizes them in yours immediately.

Traditional ML

Requires flat feature table as input
12.3 hours of feature engineering per task
One model per prediction task
Strong on single-table, well-featured data
Mature interpretability tooling

Foundation model (KumoRFM)

Reads raw relational tables directly
Zero feature engineering required
One model handles any prediction task
14+ AUROC points better on multi-table data
Seconds to first prediction, not weeks

The benchmark evidence

RelBench is the standard benchmark for ML on relational data. It includes 7 databases, 30 prediction tasks, and over 103 million rows. The results tell a clear story:

Approach	AUROC (classification)	Time per task
LightGBM + manual features	62.44	12.3 hours
LLM (Llama 3.2 3B)	68.06	Minutes (but poor accuracy)
Supervised GNN (RDL)	75.83	~30 minutes training
KumoRFM zero-shot	76.71	<1 second
KumoRFM fine-tuned	81.14	Minutes of fine-tuning

Two things stand out. First, the gap between manual features (62.44) and the graph-based approaches (75-81) is large. This is the cost of information loss during flattening. Second, KumoRFM zero-shot (no task-specific training) already outperforms a supervised GNN that was trained specifically for each task.

PQL Query

PREDICT upsell_probability
FOR EACH customers.customer_id
WHERE customers.plan = 'basic'

With a foundation model, a new prediction task is a new query, not a new project. The same model that predicts churn also predicts upsell, fraud, demand, and any other relational question. No feature engineering, no retraining.

Output

customer_id	upsell_prob	recommended_plan	top_signal
C-8801	0.89	Enterprise	Usage exceeds plan limits 3x/week
C-8802	0.72	Pro	Team size grew from 3 to 8 in 60 days
C-8803	0.31	Pro	Moderate usage, no growth signals
C-8804	0.14	Stay Basic	Low engagement, price-sensitive segment

What stays the same

Foundation models do not change everything. Several fundamentals of ML remain exactly the same.

Data quality still matters

A foundation model that reads garbage tables produces garbage predictions. Missing values, incorrect timestamps, broken foreign keys, duplicated rows, and stale data all degrade performance. The model is better at handling messy data than manual feature engineering (which often breaks on edge cases), but it is not magic.

Problem framing still matters

Choosing the right prediction target, the right entity, the right time horizon, and the right evaluation metric still requires human judgment. A foundation model can predict churn in 1 second, but deciding whether to predict 30-day churn or 90-day churn, and what to do about the predictions, is still a business decision.

Deployment still matters

Getting predictions into production systems, monitoring model performance, handling drift, and integrating with downstream workflows are engineering problems that exist regardless of how the model was built.

Evaluation still matters

You still need holdout sets, proper temporal splits, and meaningful metrics. A foundation model makes it faster to get predictions, but you still need to verify they are good before acting on them.

Decision framework

Here is a practical framework for choosing between the two approaches.

Use traditional ML when

Your data lives in a single table with pre-computed features
You have a mature feature store that already covers this use case
Strict regulatory interpretability requirements apply (linear models only)
The prediction task is simple, well-understood, and has been stable for years

Use a foundation model when

Your data spans multiple relational tables
You need predictions fast (days, not months)
You have many prediction tasks and cannot afford to build custom pipelines for each one
Your data science team spends most of their time on feature engineering instead of business problems
You suspect there are predictive signals in cross-table relationships that your current features do not capture

Use both when

You have existing high-value models in production that work well, and you want to use foundation models for new prediction tasks or rapid prototyping
You want to benchmark your current approach against a foundation model to quantify the accuracy gap
You are migrating incrementally, keeping proven models while building new ones with the foundation model approach

The real shift

The shift from traditional ML to foundation models for structured data is not about better algorithms. It is about removing a structural bottleneck.

For two decades, the ML pipeline has required converting relational data into flat tables. This conversion is lossy (multi-hop patterns disappear), slow (12.3 hours per task), and scales linearly with the number of prediction tasks. Every new question means another round of feature engineering.

Foundation models remove that conversion step. They read relational data directly. The result is faster predictions, higher accuracy on multi-table data, and a fundamentally different cost structure where the marginal cost of a new prediction task approaches zero.

Traditional ML is not dead. But the set of problems where it is the best answer is shrinking. If your data is relational and your bottleneck is feature engineering, the foundation model approach is not just faster. It is better.

Key Takeaways

1Foundation models outperform traditional ML by 14 AUROC points on multi-table relational data (76.71 vs 62.44 on RelBench). The gap is not better algorithms; it is the model seeing the full relational structure instead of a lossy flat summary.
2Traditional ML remains competitive for single-table problems with mature feature stores and strict regulatory interpretability requirements. It is not dead, but the set of problems where it is the best answer is shrinking.
3The economics of ML change fundamentally with foundation models. Instead of a 2-month project per prediction task, you get answers in seconds. A team that could build 4 models per quarter can explore 40 prediction questions in a day.
4Data quality, problem framing, evaluation, and deployment still require human judgment. Foundation models eliminate the feature engineering bottleneck, not the need for data scientists.
5Fine-tuning bridges the last accuracy gap: KumoRFM fine-tuned achieves 81.14 AUROC, 18.7 points above LightGBM with manual features. For high-value tasks, fine-tuning adds domain-specific signal on top of pre-trained relational knowledge.

Foundation Models vs Traditional ML: What Changes and What Doesn't