How much data does a foundation model need to make predictions?

For zero-shot predictions, the model needs only your database schema and the data itself. No labeled training data is required. For fine-tuning, you need historical labels for your prediction task (which customers churned, which transactions were fraudulent). Typically, 10,000 to 100,000 labeled examples are sufficient for fine-tuning, compared to the millions often needed for training from scratch.

How long does it take to get predictions from a foundation model?

Zero-shot predictions take seconds to minutes depending on database size. You connect your database, write a prediction query, and get results. Fine-tuning takes 2 to 8 hours for most enterprise datasets. Compare this to traditional ML pipelines that take 3 to 6 months per model, including 12.3 hours of feature engineering per task.

Are foundation model predictions accurate enough for production?

Yes. On RelBench (7 databases, 30 tasks, 103M+ rows), KumoRFM zero-shot outperforms manual feature engineering by a Stanford data scientist on 11 of 12 classification tasks. In production, DoorDash saw a 1.8% engagement lift across 30 million users, Snowflake saw a 3.2x expansion revenue lift, and Databricks saw a 5.4x conversion lift.

What is PQL (Predictive Query Language)?

PQL is a query language for making predictions on relational databases. It looks like SQL with a PREDICT clause. Instead of writing 'SELECT churn_probability FROM customers', you write 'PREDICT churn FOR customers WITHIN 30 days'. PQL abstracts away feature engineering, model training, and serving. Any SQL-literate team member can write PQL queries.

Can I use a foundation model with my existing data infrastructure?

Yes. Relational foundation models connect to standard databases (Snowflake, BigQuery, PostgreSQL, Databricks, Redshift) through standard connectors. No data migration is required. The model reads your existing schema and data in place. Predictions can be written back to your data warehouse or served through an API.

What does foundation model fine-tuning cost?

Fine-tuning a relational foundation model costs a fraction of building a custom model. A custom ML pipeline costs $150K to $500K per model (3 to 5 data scientists, 3 to 6 months). Fine-tuning costs the compute for a few hours of GPU time plus the platform fee. The total is typically $5K to $20K per task, including validation and deployment.

How do I evaluate a foundation model before committing?

Run a zero-shot test. Connect your database, define 2 to 3 prediction tasks, and compare the foundation model's AUROC against your current models. This takes hours, not months. If zero-shot accuracy matches or exceeds your production models, the business case is straightforward. If not, run a fine-tuning test to see if task-specific adaptation closes the gap.

Foundation Models FAQ: 15 Questions Answered | Kumo.ai

Q: How does a foundation model differ from a traditional ML model?

Traditional ML models are trained from scratch on a single dataset for a single task. A churn model only predicts churn. A fraud model only detects fraud. Foundation models are pre-trained on diverse data and transfer that knowledge to new tasks. KumoRFM pre-trained on 5,000+ relational databases and predicts churn, fraud, demand, recommendations, and any other relational prediction from a single model.

Q: What types of foundation models exist?

Three major categories: (1) Language foundation models (GPT-4, Claude, Llama) trained on text, used for generation, reasoning, and conversation. (2) Tabular foundation models trained on single flat tables, used for classification and regression on structured data. (3) Relational foundation models (KumoRFM) trained on multi-table relational databases, used for predictions that require cross-table patterns.

Q: Can LLMs make predictions on structured data?

LLMs can process serialized tables (CSV or JSON format), but they perform poorly on structured data prediction tasks. On the RelBench benchmark, Llama 3.2 3B achieved 68.06 AUROC on classification tasks, compared to 75.83 for a supervised GNN and 76.71 for KumoRFM zero-shot. LLMs were designed to predict the next token in text, not to find patterns in numerical relational data.

Q: What is the difference between zero-shot and fine-tuned predictions?

Zero-shot means the model makes predictions without any training on your specific data. It uses only the knowledge from pre-training. Fine-tuning trains the model on your data for a few hours to adapt to your specific patterns. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC. Fine-tuned on the same tasks, it reaches 81.14 AUROC, a 4.4-point improvement.

Q: How does a relational foundation model handle multiple tables?

A relational foundation model represents your database as a temporal heterogeneous graph. Each table row becomes a node tagged with its entity type. Each foreign key becomes an edge. Timestamps create temporal ordering. The model then uses graph transformers to learn patterns across this connected structure, automatically discovering multi-hop relationships that span 2 to 4 tables.

Foundation models are the most significant shift in machine learning since deep learning went mainstream in 2012. GPT proved the concept for text. Now the same approach applies to structured enterprise data. These are the 15 questions we hear most often, answered directly.

1. What is a foundation model?

A foundation model is a large neural network pre-trained on broad, diverse data that can be adapted to many downstream tasks without task-specific training from scratch. The key insight is that patterns learned from massive datasets transfer to new, unseen tasks.

GPT-4 learned language patterns from trillions of tokens and can answer questions it was never specifically trained on. KumoRFM learned relational data patterns from 5,000+ databases and can make predictions on databases it has never seen. The same principle, applied to different data modalities.

2. How does a foundation model differ from a traditional ML model?

Traditional ML models are single-purpose. You train a churn model on churn data. You train a fraud model on fraud data. Each model starts from random initialization and learns only from its training dataset. If you need 10 predictions, you build 10 models.

Foundation models are general-purpose. One model handles many tasks. Pre-training gives it a head start: it already understands the patterns common to relational data (recency, frequency, temporal dynamics, graph topology). At inference time, it applies those patterns to your specific data and task.

3. What types of foundation models exist?

Three major categories serve different data types:

Language foundation models (GPT-4, Claude, Llama): trained on text, used for generation, reasoning, and conversation. These are what most people mean when they say "AI" in 2026.
Tabular foundation models: trained on collections of flat tables, used for classification and regression on single-table structured data. These handle one table at a time.
Relational foundation models (KumoRFM): trained on multi-table relational databases, used for predictions that require cross-table patterns. These handle the full relational structure.

foundation_model_types_compared

Type	Input	Example	Tables Handled	RelBench AUROC
Language FM	Text tokens	GPT-4, Claude, Llama 3.2	1 (serialized)	68.06
Tabular FM	Single flat table	TabPFN, CARTE	1	~65-68
Relational FM	Multi-table database	KumoRFM	3-50+	76.71 (zero-shot)

Enterprise data spans 10-50 connected tables. Only relational FMs handle the full structure. Language FMs serialize tables as text, losing schema and numerical relationships.

4. Can LLMs make predictions on structured data?

They can try. You serialize a table as CSV or JSON, feed it to the LLM, and ask for a prediction. The RelBench benchmark tested this directly: Llama 3.2 3B achieved 68.06 AUROC on classification tasks. A supervised GNN achieved 75.83. KumoRFM zero-shot achieved 76.71.

LLMs underperform because they process data as text tokens, not as structured relationships. The number "42.50" in a price column is semantically different from "42.50" in a latitude column, but to an LLM, both are just token sequences. Relational foundation models encode data type, position in the schema, and relational context into every value.

5. What is the difference between zero-shot and fine-tuned predictions?

Zero-shot predictions use only the knowledge from pre-training. You connect your database, define the task, and the model predicts using patterns it learned across its training corpus. No labeled data from your specific task is required.

Fine-tuning adapts the model to your specific data distribution using your historical labels. On RelBench, the gap is meaningful: 76.71 AUROC zero-shot vs. 81.14 fine-tuned. Fine-tuning is most valuable when your data has domain-specific patterns (proprietary product categories, industry-specific behaviors) that the pre-training corpus did not cover extensively.

zero_shot_vs_fine_tuned_example

Customer	Zero-Shot Risk	Fine-Tuned Risk	Actual Outcome	Key Signal
C-2841	0.78	0.84	Churned	Declining logins + open ticket
C-5510	0.22	0.18	Retained	Steady usage pattern
C-9032	0.45	0.71	Churned	Industry-specific seasonal drop
C-1177	0.61	0.38	Retained	New product adoption (domain-specific)

Highlighted: cases where fine-tuning corrects zero-shot predictions. C-9032's industry has seasonal churn the pre-trained model underweights. C-1177's new product adoption is a retention signal specific to this company.

6. How much data does a foundation model need?

For zero-shot: your database schema and current data. No historical labels needed. The model makes predictions based on structural patterns and the data itself.

For fine-tuning: 10,000 to 100,000 labeled examples are typically sufficient. This is 10x to 100x less than training a model from scratch, because the foundation model already understands relational patterns and only needs to adapt to your specific distribution.

7. How long does it take to get predictions?

Zero-shot: seconds to minutes after connecting your database. The Stanford study measured traditional ML at 12.3 hours and 878 lines of code per prediction task. Foundation models compress that to a single PQL query.

Fine-tuning: 2 to 8 hours depending on dataset size and number of tables. Compare this to 3 to 6 months for a traditional custom ML pipeline.

8. Are these predictions accurate enough for production?

Yes, with evidence at scale. KumoRFM zero-shot outperforms manual feature engineering on 11 of 12 RelBench classification tasks. In production deployments: DoorDash saw a 1.8% engagement lift across 30 million users. Snowflake saw a 3.2x expansion revenue lift. Databricks saw a 5.4x conversion lift. These are not toy experiments; they are production systems serving millions of predictions daily.

9. What is PQL?

PQL (Predictive Query Language) is the interface for making predictions with a relational foundation model. It extends SQL with a PREDICT clause. Instead of writing hundreds of lines of feature engineering code, you write one query that describes what you want to predict, for which entities, and over what time horizon.

Any team member who can write SQL can write PQL. This removes the bottleneck of requiring ML engineers for every prediction task.

10. How does a relational foundation model handle multiple tables?

It represents your database as a temporal heterogeneous graph. Each row becomes a node. Foreign keys become edges. Timestamps create temporal ordering. The model uses graph transformers to propagate information across this structure, learning which cross-table patterns are predictive for each task.

A 3-hop message pass on a customer node captures: the customer's attributes, their order history, the products in those orders, other customers who ordered those products, and those customers' behavior. This is exactly the kind of signal that manual feature engineering almost never captures.

11. Does it work with my existing infrastructure?

Relational foundation models connect to standard data warehouses and databases: Snowflake, BigQuery, PostgreSQL, Databricks, Redshift, and others. No data migration required. The model reads your data in place and writes predictions back to your preferred destination.

12. What does fine-tuning cost?

Building a custom ML pipeline: $150K to $500K per model (team of 3 to 5 data scientists, 3 to 6 months of work, plus infrastructure). Fine-tuning a foundation model: $5K to $20K per task (a few hours of GPU compute plus platform fee). The cost reduction is 10x to 50x.

13. Do foundation models work for all prediction tasks?

They work best on relational prediction tasks where the signal is distributed across connected tables: churn, fraud, recommendations, demand forecasting, lead scoring, lifetime value, next-best-action, credit risk. They add less value on tasks where the input is not relational (image classification, audio processing, protein folding).

14. How do I evaluate before committing?

Run a zero-shot benchmark. Pick 2 to 3 prediction tasks where you have existing models with known performance. Connect your database, run the predictions, and compare AUROC side by side. This takes hours, not months. If zero-shot matches or beats your current models, the business case is clear.

15. Will foundation models replace data scientists?

They replace the repetitive work: feature engineering (80% of time according to the Stanford study), model selection, hyperparameter tuning, and pipeline maintenance. They do not replace the strategic work: defining which problems to solve, designing evaluation frameworks, interpreting predictions for business stakeholders, and building the organizational systems that turn predictions into action. Data scientists become more valuable, not less, because they can focus on impact instead of plumbing.

Key Takeaways

1Foundation models are pre-trained on broad data and transfer to new tasks without task-specific training. GPT-4 does this for text. KumoRFM does this for relational data. The defining trait is generality: one model, many tasks, zero-shot capability.
2Three types exist: language FMs (GPT-4, Claude), tabular FMs (single flat tables), and relational FMs (KumoRFM, multi-table databases). The distinction matters because enterprise data is relational, not tabular -- a tabular FM misses cross-table patterns.
3LLMs perform poorly on structured data: 68.06 AUROC on RelBench vs. 76.71 for KumoRFM zero-shot. LLMs process data as text tokens and miss numerical relationships and schema structure that purpose-built relational models capture.
4Zero-shot predictions take seconds with no labeled data. Fine-tuning takes 2-8 hours and improves AUROC from 76.71 to 81.14 on RelBench. Compare this to 3-6 months and $150K-500K per custom ML pipeline.
5Foundation models replace the repetitive 80% of data science (feature engineering, model selection, pipeline maintenance) while elevating the strategic 20% (problem framing, evaluation design, business integration).

Foundation Models FAQ: 15 Questions Answered

1. What is a foundation model?

2. How does a foundation model differ from a traditional ML model?

3. What types of foundation models exist?

4. Can LLMs make predictions on structured data?

5. What is the difference between zero-shot and fine-tuned predictions?

6. How much data does a foundation model need?

7. How long does it take to get predictions?

8. Are these predictions accurate enough for production?

9. What is PQL?

10. How does a relational foundation model handle multiple tables?

11. Does it work with my existing infrastructure?

12. What does fine-tuning cost?

13. Do foundation models work for all prediction tasks?

14. How do I evaluate before committing?

15. Will foundation models replace data scientists?

Frequently asked questions

What is a foundation model?

How does a foundation model differ from a traditional ML model?

What types of foundation models exist?

Can LLMs make predictions on structured data?

What is the difference between zero-shot and fine-tuned predictions?

How much data does a foundation model need to make predictions?

How long does it take to get predictions from a foundation model?

Are foundation model predictions accurate enough for production?

What is PQL (Predictive Query Language)?

How does a relational foundation model handle multiple tables?

Can I use a foundation model with my existing data infrastructure?

What does foundation model fine-tuning cost?

Do foundation models work for all prediction tasks?

How do I evaluate a foundation model before committing?

Will foundation models replace data scientists?

Related topics

See it in action