Foundation models are the most significant shift in machine learning since deep learning went mainstream in 2012. GPT proved the concept for text. Now the same approach applies to structured enterprise data. These are the 15 questions we hear most often, answered directly.
1. What is a foundation model?
A foundation model is a large neural network pre-trained on broad, diverse data that can be adapted to many downstream tasks without task-specific training from scratch. The key insight is that patterns learned from massive datasets transfer to new, unseen tasks.
GPT-4 learned language patterns from trillions of tokens and can answer questions it was never specifically trained on. KumoRFM learned relational data patterns from 5,000+ databases and can make predictions on databases it has never seen. The same principle, applied to different data modalities.
2. How does a foundation model differ from a traditional ML model?
Traditional ML models are single-purpose. You train a churn model on churn data. You train a fraud model on fraud data. Each model starts from random initialization and learns only from its training dataset. If you need 10 predictions, you build 10 models.
Foundation models are general-purpose. One model handles many tasks. Pre-training gives it a head start: it already understands the patterns common to relational data (recency, frequency, temporal dynamics, graph topology). At inference time, it applies those patterns to your specific data and task.
3. What types of foundation models exist?
Three major categories serve different data types:
- Language foundation models (GPT-4, Claude, Llama): trained on text, used for generation, reasoning, and conversation. These are what most people mean when they say "AI" in 2026.
- Tabular foundation models: trained on collections of flat tables, used for classification and regression on single-table structured data. These handle one table at a time.
- Relational foundation models (KumoRFM): trained on multi-table relational databases, used for predictions that require cross-table patterns. These handle the full relational structure.
foundation_model_types_compared
| Type | Input | Example | Tables Handled | RelBench AUROC |
|---|---|---|---|---|
| Language FM | Text tokens | GPT-4, Claude, Llama 3.2 | 1 (serialized) | 68.06 |
| Tabular FM | Single flat table | TabPFN, CARTE | 1 | ~65-68 |
| Relational FM | Multi-table database | KumoRFM | 3-50+ | 76.71 (zero-shot) |
Enterprise data spans 10-50 connected tables. Only relational FMs handle the full structure. Language FMs serialize tables as text, losing schema and numerical relationships.
4. Can LLMs make predictions on structured data?
They can try. You serialize a table as CSV or JSON, feed it to the LLM, and ask for a prediction. The RelBench benchmark tested this directly: Llama 3.2 3B achieved 68.06 AUROC on classification tasks. A supervised GNN achieved 75.83. KumoRFM zero-shot achieved 76.71.
LLMs underperform because they process data as text tokens, not as structured relationships. The number "42.50" in a price column is semantically different from "42.50" in a latitude column, but to an LLM, both are just token sequences. Relational foundation models encode data type, position in the schema, and relational context into every value.
5. What is the difference between zero-shot and fine-tuned predictions?
Zero-shot predictions use only the knowledge from pre-training. You connect your database, define the task, and the model predicts using patterns it learned across its training corpus. No labeled data from your specific task is required.
Fine-tuning adapts the model to your specific data distribution using your historical labels. On RelBench, the gap is meaningful: 76.71 AUROC zero-shot vs. 81.14 fine-tuned. Fine-tuning is most valuable when your data has domain-specific patterns (proprietary product categories, industry-specific behaviors) that the pre-training corpus did not cover extensively.
zero_shot_vs_fine_tuned_example
| Customer | Zero-Shot Risk | Fine-Tuned Risk | Actual Outcome | Key Signal |
|---|---|---|---|---|
| C-2841 | 0.78 | 0.84 | Churned | Declining logins + open ticket |
| C-5510 | 0.22 | 0.18 | Retained | Steady usage pattern |
| C-9032 | 0.45 | 0.71 | Churned | Industry-specific seasonal drop |
| C-1177 | 0.61 | 0.38 | Retained | New product adoption (domain-specific) |
Highlighted: cases where fine-tuning corrects zero-shot predictions. C-9032's industry has seasonal churn the pre-trained model underweights. C-1177's new product adoption is a retention signal specific to this company.
6. How much data does a foundation model need?
For zero-shot: your database schema and current data. No historical labels needed. The model makes predictions based on structural patterns and the data itself.
For fine-tuning: 10,000 to 100,000 labeled examples are typically sufficient. This is 10x to 100x less than training a model from scratch, because the foundation model already understands relational patterns and only needs to adapt to your specific distribution.
7. How long does it take to get predictions?
Zero-shot: seconds to minutes after connecting your database. The Stanford study measured traditional ML at 12.3 hours and 878 lines of code per prediction task. Foundation models compress that to a single PQL query.
Fine-tuning: 2 to 8 hours depending on dataset size and number of tables. Compare this to 3 to 6 months for a traditional custom ML pipeline.
8. Are these predictions accurate enough for production?
Yes, with evidence at scale. KumoRFM zero-shot outperforms manual feature engineering on 11 of 12 RelBench classification tasks. In production deployments: DoorDash saw a 1.8% engagement lift across 30 million users. Snowflake saw a 3.2x expansion revenue lift. Databricks saw a 5.4x conversion lift. These are not toy experiments; they are production systems serving millions of predictions daily.
9. What is PQL?
PQL (Predictive Query Language) is the interface for making predictions with a relational foundation model. It extends SQL with a PREDICT clause. Instead of writing hundreds of lines of feature engineering code, you write one query that describes what you want to predict, for which entities, and over what time horizon.
Any team member who can write SQL can write PQL. This removes the bottleneck of requiring ML engineers for every prediction task.
10. How does a relational foundation model handle multiple tables?
It represents your database as a temporal heterogeneous graph. Each row becomes a node. Foreign keys become edges. Timestamps create temporal ordering. The model uses graph transformers to propagate information across this structure, learning which cross-table patterns are predictive for each task.
A 3-hop message pass on a customer node captures: the customer's attributes, their order history, the products in those orders, other customers who ordered those products, and those customers' behavior. This is exactly the kind of signal that manual feature engineering almost never captures.
11. Does it work with my existing infrastructure?
Relational foundation models connect to standard data warehouses and databases: Snowflake, BigQuery, PostgreSQL, Databricks, Redshift, and others. No data migration required. The model reads your data in place and writes predictions back to your preferred destination.
12. What does fine-tuning cost?
Building a custom ML pipeline: $150K to $500K per model (team of 3 to 5 data scientists, 3 to 6 months of work, plus infrastructure). Fine-tuning a foundation model: $5K to $20K per task (a few hours of GPU compute plus platform fee). The cost reduction is 10x to 50x.
13. Do foundation models work for all prediction tasks?
They work best on relational prediction tasks where the signal is distributed across connected tables: churn, fraud, recommendations, demand forecasting, lead scoring, lifetime value, next-best-action, credit risk. They add less value on tasks where the input is not relational (image classification, audio processing, protein folding).
14. How do I evaluate before committing?
Run a zero-shot benchmark. Pick 2 to 3 prediction tasks where you have existing models with known performance. Connect your database, run the predictions, and compare AUROC side by side. This takes hours, not months. If zero-shot matches or beats your current models, the business case is clear.
15. Will foundation models replace data scientists?
They replace the repetitive work: feature engineering (80% of time according to the Stanford study), model selection, hyperparameter tuning, and pipeline maintenance. They do not replace the strategic work: defining which problems to solve, designing evaluation frameworks, interpreting predictions for business stakeholders, and building the organizational systems that turn predictions into action. Data scientists become more valuable, not less, because they can focus on impact instead of plumbing.