What is AI prediction?

AI prediction is the use of machine learning models to forecast future outcomes from historical data. It encompasses classification (will this customer churn: yes or no?), regression (what will this customer spend next quarter?), recommendation (which products should we show this user?), and time-series forecasting (what will demand look like in Q3?). The quality of any AI prediction depends on two things: the model architecture and how much of the relevant data the model can actually see.

How is AI prediction different from traditional statistical forecasting?

Traditional statistical methods (ARIMA, exponential smoothing, logistic regression) work well on structured, single-table data with clear linear relationships. AI prediction methods, particularly deep learning, can capture non-linear patterns, interactions between hundreds of variables, and multi-hop relationships across interconnected data sources. The gap widens as data complexity increases: on simple, single-table problems the difference is small; on multi-table relational data it is substantial.

What types of business predictions can AI make?

Four main types: (1) Classification: binary or multi-class outcomes like churn, fraud, conversion, and default risk. (2) Regression: continuous values like customer lifetime value, expected revenue, and credit scores. (3) Recommendation: ranking entities like products, content, or offers for a specific user. (4) Forecasting: time-dependent predictions like demand, inventory needs, and capacity planning. Foundation models like KumoRFM handle all four types from the same model.

Why does relational context improve AI predictions?

Most business data is stored across multiple connected tables: customers, orders, products, interactions, support tickets. Traditional ML requires flattening this into a single table, losing cross-table relationships and temporal sequences. Models that learn directly from the relational structure (representing the database as a graph) see patterns that span 3-4 tables deep, like a customer's churn risk being linked to the return rates of products they bought. This consistently produces 10-30% AUROC improvements over flat-table approaches.

What is the feature engineering bottleneck in AI prediction?

Before a traditional ML model can make predictions, someone must manually transform multi-table data into a flat feature matrix: writing SQL joins, computing aggregations, and selecting time windows. A Stanford study measured this at 12.3 hours and 878 lines of code per prediction task. This bottleneck is why most prediction projects take months to deliver. Foundation models eliminate it by learning directly from raw relational tables, reducing the time from hours to seconds.

AI Prediction: How Machines Forecast What Happens Next | Kumo.ai

When DoorDash wants to decide which restaurants to show you at 6 PM on a Tuesday, it is making a prediction. When a bank decides whether to approve a wire transfer in real time, it is making a prediction. When Snowflake identifies which free-tier users are likely to convert to paid plans, it is making a prediction.

AI prediction is the engine behind these decisions. It takes historical data, finds patterns, and extrapolates them forward. The concept is simple. The execution is where things get complicated, because the quality of the prediction depends entirely on how much context the model can consume. And most models are starving.

The four types of prediction

Nearly every business prediction falls into one of four categories. The boundaries are fuzzy, the business applications are distinct.

Classification

Will this event happen or not? Is this transaction fraud? Will this customer churn in the next 30 days? Will this patient be readmitted within 90 days? The output is a probability between 0 and 1, and the business sets a threshold for action. Classification is the most common prediction type in enterprise ML.

Regression

How much? What will this customer's lifetime value be? What is the expected loss on this loan? What will revenue be next quarter? The output is a continuous number. Regression models are the backbone of financial planning, pricing, and risk management.

Recommendation

What should we show this user? Product recommendations, content recommendations, ad targeting, next-best-action in sales. The output is a ranked list of entities, ordered by predicted relevance or engagement probability. Recommendation drives the majority of revenue at companies like Amazon, Netflix, and Spotify.

Time-series forecasting

What will the future look like over time? Demand forecasting for inventory planning, capacity planning for cloud infrastructure, workload prediction for staffing. The output is a sequence of predicted values across future time steps, often with confidence intervals.

Here is a single database that demonstrates all four prediction types. The same relational structure supports different questions.

all four predictions from one database

prediction_type	question	PQL_query	output
Classification	Will PH-304 file a fraudulent claim?	PREDICT claims.is_fraud FOR EACH policyholders	0.84 (yes/no probability)
Regression	What will PH-301's total claims cost next year?	PREDICT SUM(claims.amount, 0, 365) FOR EACH policyholders	$22,700 (dollar amount)
Recommendation	Which risk mitigation products should we offer PH-303?	PREDICT products.relevance FOR EACH policyholders, products	Ranked list of products
Forecasting	How many auto claims will we see in Q1?	PREDICT COUNT(claims.*, 0, 90) FOR EACH claim_types	847 claims (with confidence interval)

Same database (policyholders, claims, adjusters from the tables below), four different prediction types. A foundation model handles all four without separate pipelines.

To make this concrete, here is what AI prediction looks like on insurance claims data. The signal that determines claim outcomes spans multiple tables.

policyholders

policyholder_id	name	policy_type	premium	tenure
PH-301	Andrea Collins	Auto + Home	$2,840/yr	7 years
PH-302	James Okafor	Auto	$1,420/yr	2 years
PH-303	Mei-Lin Chang	Home	$1,950/yr	11 years
PH-304	Derek Simmons	Auto	$2,100/yr	4 years

claims

claim_id	policyholder_id	type	amount	date	status
CLM-501	PH-301	Auto collision	$8,200	2025-03-14	Paid
CLM-502	PH-301	Home water damage	$14,500	2025-09-02	Under review
CLM-503	PH-302	Auto theft	$22,000	2025-10-18	Under review
CLM-504	PH-304	Auto collision	$4,100	2025-06-22	Paid
CLM-505	PH-304	Auto collision	$6,800	2025-08-30	Paid
CLM-506	PH-304	Auto collision	$9,200	2025-11-05	Under review

Highlighted: Derek has 3 collision claims in 5 months with escalating amounts ($4.1K, $6.8K, $9.2K). This temporal pattern is a strong fraud signal, but a flat feature table only shows 'claim_count = 3'.

adjusters

adjuster_id	claim_id	assessment	payout_ratio	days_to_close
ADJ-01	CLM-501	Legitimate	100%	12
ADJ-02	CLM-504	Legitimate	100%	8
ADJ-03	CLM-505	Legitimate	95%	15
ADJ-04	CLM-506	Pending investigation	---	---

The payout ratio on Derek's second claim was already reduced to 95%. His third is pending investigation. The model needs to see the claims-adjusters relationship to predict fraud probability.

The context problem

The accuracy of any prediction is bounded by the information available to the model. This sounds obvious. In practice, it is the single biggest constraint on prediction quality in enterprise ML.

Consider a churn prediction for a SaaS product. The data lives across multiple tables: user accounts, login events, feature usage, support tickets, billing history, team memberships, contract terms. A thorough churn model needs signals from all of these tables: declining login frequency, reduced feature adoption, increasing support volume, approaching contract renewal.

But traditional ML models cannot read multiple tables. They need a single flat feature matrix: one row per user, one column per feature. To get there, a data scientist writes SQL joins and aggregations, compressing rich relational data into a handful of numbers. avg_logins_30d, support_tickets_90d, days_until_renewal.

This flattening destroys three categories of signal.

Cross-table relationships. The fact that 3 of 5 users on the same team have already churned is a strong signal. But it requires traversing customer → team → other customers, a multi-hop path that no standard aggregation captures.
Temporal sequences. A count of "5 support tickets in 30 days" does not distinguish between steady complaints and a sudden spike after a product update. The sequence matters.
Combinatorial interactions. The interaction between declining usage and an approaching renewal and unresolved tickets is more predictive than any single feature. But engineering interaction features manually is combinatorially explosive.

Why the bottleneck exists

The gap between "data exists in the database" and "model can use the data" is feature engineering. This step consumes 80% of the time in a typical ML project. A Stanford study quantified it: 12.3 hours and 878 lines of code per prediction task, even for experienced data scientists with full access to the data.

The time cost is not the worst part. The worst part is the signal loss. A human data scientist exploring a 10-table database will test maybe 100 to 200 feature combinations. The total feature space (tables times columns times aggregation functions times time windows) can easily exceed 10,000 possibilities. The model is trained on 2% of the available signal.

This is why adding more data to a traditional ML pipeline often does not help. The data is there, in the database. But the pipeline cannot consume it. The model is limited not by data availability, but by the human capacity to transform that data into features.

Traditional prediction pipeline

Flatten 10+ tables into one row per entity
Human selects 100-200 features from 10,000+ candidates
12.3 hours and 878 lines of code per task
Model sees 2% of available signal
New prediction task = new pipeline from scratch

Foundation model prediction

Model reads all tables directly as a graph
Model explores full feature space automatically
1 second, 1 line of PQL per task
Model sees 100% of relational structure
New prediction task = new query, same model

PQL Query

PREDICT claims.status = 'Fraudulent'
FOR EACH policyholders.policyholder_id

The model reads policyholders, claims, and adjusters as a graph. It discovers that Derek's escalating claim amounts, decreasing adjuster payout ratios, and 3-claim-in-5-months cadence produce a high fraud signal.

Output

policyholder_id	fraud_probability	top_signal
PH-301	0.06	Long tenure, first multi-policy claims
PH-302	0.38	Auto theft on short-tenure policy
PH-303	0.02	11-year tenure, no claims history
PH-304	0.84	Escalating collision amounts, 3 in 5 months

The foundation model approach

The insight behind relational foundation models is that the prediction bottleneck is not the model. It is the data transformation. If you eliminate the transformation, the prediction becomes trivial.

KumoRFM represents your database as a temporal heterogeneous graph. Rows become nodes. Foreign keys become edges. Timestamps establish ordering. The model traverses this graph to find predictive patterns, including multi-hop relationships, temporal sequences, and structural signatures that no human would enumerate.

Because the model is pre-trained on billions of relational patterns across thousands of databases, it already understands the universal dynamics that recur in business data: recency effects, frequency patterns, seasonal cycles, network propagation. It does not need to learn these from scratch on your data. It recognizes them.

The interface is a query, not a pipeline. "For each customer, what is the probability of churn in the next 30 days?" The model reads your schema, builds the graph, traverses it, and returns predictions with cell-level explanations. One query. One second. No feature engineering. No training pipeline.

Real-world results

The claims are backed by production deployments at scale.

DoorDash deployed relational predictions across 30 million users for restaurant and content recommendations. The result: a 1.8% engagement lift over their existing recommendation system, which was already highly optimized. At DoorDash's scale, 1.8% translates to millions of additional orders per quarter.

Snowflake used the same approach to predict which free-tier users would convert to paid plans and which existing customers would expand their usage. The result: a 3.2x expansion revenue lift by targeting the right accounts with the right timing.

Reddit applied relational predictions to content recommendations, leveraging the full graph of users, communities, posts, comments, and interactions. The model found engagement patterns in multi-hop paths (user → community → post → commenters → other communities) that their previous system could not express.

Databricks measured a 5.4x conversion lift using relational predictions for their sales pipeline, identifying which trial users were most likely to convert based on usage patterns, team dynamics, and integration activity.

On the RelBench benchmark

Production case studies are compelling but hard to reproduce. That is why the RelBench benchmark exists: 7 databases, 30 prediction tasks, 103 million+ rows, temporal train/test splits. On this benchmark:

LightGBM with manual features: 62.44 average AUROC on classification
LLM on serialized tables (Llama 3.2 3B): 68.06 AUROC
Task-specific GNN: 75.83 AUROC
KumoRFM zero-shot: 76.71 AUROC
KumoRFM fine-tuned: 81.14 AUROC

The pattern is clear. The more relational context a model can consume, the better its predictions. LightGBM sees a flat table. The LLM sees serialized text. The GNN sees the graph structure. KumoRFM sees the graph structure plus universal relational patterns learned from pre-training. Each step up in context produces a measurable jump in accuracy.

What this means for prediction strategy

If your organization treats AI prediction as a pipeline-building exercise, you are leaving accuracy and speed on the table. The pipeline approach caps your model's performance at whatever signal a human can manually extract from the database. The foundation model approach removes that cap.

The practical implication is speed. When a business stakeholder asks "can we predict X?", the answer should not be "let me scope a 3-month project." It should be "let me run that query." Every prediction task that takes months to deliver is a decision that was made without data for months. That cost is invisible but real.

The data for better predictions already exists in your database. The question is whether your prediction infrastructure can actually use it.

Key Takeaways

1Every business question is a prediction: classification (fraud?), regression (how much?), recommendation (what to show?), and forecasting (what next?). Foundation models handle all four from one model.
2Prediction accuracy is bounded by the information available to the model. Traditional ML sees 2% of the available feature space after manual engineering. Foundation models see the full relational graph.
3The feature engineering bottleneck is not about speed but coverage: a human explores 100-200 features while the true feature space across tables, aggregations, and time windows exceeds 10,000 possibilities.
4On RelBench, each step up in relational context produces a measurable accuracy jump: LightGBM 62.44, LLM 68.06, GNN 75.83, KumoRFM 76.71 zero-shot, KumoRFM fine-tuned 81.14.
5The operational shift is from pipeline-building to question-asking. When a stakeholder asks 'can we predict X?', the answer should be 'let me run that query', not 'let me scope a 3-month project.'

AI Prediction: How Machines Forecast What Happens Next