Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn13 min read

AI Prediction: How Machines Forecast What Happens Next

Every business question is a prediction. Will this customer churn? Is this transaction fraud? What will demand look like next quarter? AI prediction turns historical patterns into forward-looking decisions. But the quality of the prediction depends entirely on how much of the data the model can actually see.

TL;DR

  • 1Every business question is a prediction: classification (fraud?), regression (how much?), recommendation (what to show?), and forecasting (what next?). Foundation models handle all four.
  • 2Prediction accuracy is bounded by the data the model can see. Traditional ML explores 2% of the feature space after manual engineering. Foundation models see the full relational graph.
  • 3The feature engineering bottleneck: 12.3 hours, 878 lines of code per task. A human tests 100-200 features while the true space exceeds 10,000 possibilities across tables.
  • 4On RelBench, each step up in relational context produces a measurable accuracy jump: LightGBM 62.44, LLM 68.06, GNN 75.83, KumoRFM 76.71 zero-shot, 81.14 fine-tuned.
  • 5DoorDash saw 1.8% engagement lift across 30M users. Snowflake saw 3.2x expansion revenue lift. Databricks saw 5.4x conversion lift. All from relational patterns flat models cannot access.

When DoorDash wants to decide which restaurants to show you at 6 PM on a Tuesday, it is making a prediction. When a bank decides whether to approve a wire transfer in real time, it is making a prediction. When Snowflake identifies which free-tier users are likely to convert to paid plans, it is making a prediction.

AI prediction is the engine behind these decisions. It takes historical data, finds patterns, and extrapolates them forward. The concept is simple. The execution is where things get complicated, because the quality of the prediction depends entirely on how much context the model can consume. And most models are starving.

The four types of prediction

Nearly every business prediction falls into one of four categories. The boundaries are fuzzy, the business applications are distinct.

Classification

Will this event happen or not? Is this transaction fraud? Will this customer churn in the next 30 days? Will this patient be readmitted within 90 days? The output is a probability between 0 and 1, and the business sets a threshold for action. Classification is the most common prediction type in enterprise ML.

Regression

How much? What will this customer's lifetime value be? What is the expected loss on this loan? What will revenue be next quarter? The output is a continuous number. Regression models are the backbone of financial planning, pricing, and risk management.

Recommendation

What should we show this user? Product recommendations, content recommendations, ad targeting, next-best-action in sales. The output is a ranked list of entities, ordered by predicted relevance or engagement probability. Recommendation drives the majority of revenue at companies like Amazon, Netflix, and Spotify.

Time-series forecasting

What will the future look like over time? Demand forecasting for inventory planning, capacity planning for cloud infrastructure, workload prediction for staffing. The output is a sequence of predicted values across future time steps, often with confidence intervals.

Here is a single database that demonstrates all four prediction types. The same relational structure supports different questions.

all four predictions from one database

prediction_typequestionPQL_queryoutput
ClassificationWill PH-304 file a fraudulent claim?PREDICT claims.is_fraud FOR EACH policyholders0.84 (yes/no probability)
RegressionWhat will PH-301's total claims cost next year?PREDICT SUM(claims.amount, 0, 365) FOR EACH policyholders$22,700 (dollar amount)
RecommendationWhich risk mitigation products should we offer PH-303?PREDICT products.relevance FOR EACH policyholders, productsRanked list of products
ForecastingHow many auto claims will we see in Q1?PREDICT COUNT(claims.*, 0, 90) FOR EACH claim_types847 claims (with confidence interval)

Same database (policyholders, claims, adjusters from the tables below), four different prediction types. A foundation model handles all four without separate pipelines.

To make this concrete, here is what AI prediction looks like on insurance claims data. The signal that determines claim outcomes spans multiple tables.

policyholders

policyholder_idnamepolicy_typepremiumtenure
PH-301Andrea CollinsAuto + Home$2,840/yr7 years
PH-302James OkaforAuto$1,420/yr2 years
PH-303Mei-Lin ChangHome$1,950/yr11 years
PH-304Derek SimmonsAuto$2,100/yr4 years

claims

claim_idpolicyholder_idtypeamountdatestatus
CLM-501PH-301Auto collision$8,2002025-03-14Paid
CLM-502PH-301Home water damage$14,5002025-09-02Under review
CLM-503PH-302Auto theft$22,0002025-10-18Under review
CLM-504PH-304Auto collision$4,1002025-06-22Paid
CLM-505PH-304Auto collision$6,8002025-08-30Paid
CLM-506PH-304Auto collision$9,2002025-11-05Under review

Highlighted: Derek has 3 collision claims in 5 months with escalating amounts ($4.1K, $6.8K, $9.2K). This temporal pattern is a strong fraud signal, but a flat feature table only shows 'claim_count = 3'.

adjusters

adjuster_idclaim_idassessmentpayout_ratiodays_to_close
ADJ-01CLM-501Legitimate100%12
ADJ-02CLM-504Legitimate100%8
ADJ-03CLM-505Legitimate95%15
ADJ-04CLM-506Pending investigation------

The payout ratio on Derek's second claim was already reduced to 95%. His third is pending investigation. The model needs to see the claims-adjusters relationship to predict fraud probability.

The context problem

The accuracy of any prediction is bounded by the information available to the model. This sounds obvious. In practice, it is the single biggest constraint on prediction quality in enterprise ML.

Consider a churn prediction for a SaaS product. The data lives across multiple tables: user accounts, login events, feature usage, support tickets, billing history, team memberships, contract terms. A thorough churn model needs signals from all of these tables: declining login frequency, reduced feature adoption, increasing support volume, approaching contract renewal.

But traditional ML models cannot read multiple tables. They need a single flat feature matrix: one row per user, one column per feature. To get there, a data scientist writes SQL joins and aggregations, compressing rich relational data into a handful of numbers. avg_logins_30d, support_tickets_90d, days_until_renewal.

This flattening destroys three categories of signal.

  • Cross-table relationships. The fact that 3 of 5 users on the same team have already churned is a strong signal. But it requires traversing customer → team → other customers, a multi-hop path that no standard aggregation captures.
  • Temporal sequences. A count of "5 support tickets in 30 days" does not distinguish between steady complaints and a sudden spike after a product update. The sequence matters.
  • Combinatorial interactions. The interaction between declining usage and an approaching renewal and unresolved tickets is more predictive than any single feature. But engineering interaction features manually is combinatorially explosive.

Why the bottleneck exists

The gap between "data exists in the database" and "model can use the data" is feature engineering. This step consumes 80% of the time in a typical ML project. A Stanford study quantified it: 12.3 hours and 878 lines of code per prediction task, even for experienced data scientists with full access to the data.

The time cost is not the worst part. The worst part is the signal loss. A human data scientist exploring a 10-table database will test maybe 100 to 200 feature combinations. The total feature space (tables times columns times aggregation functions times time windows) can easily exceed 10,000 possibilities. The model is trained on 2% of the available signal.

This is why adding more data to a traditional ML pipeline often does not help. The data is there, in the database. But the pipeline cannot consume it. The model is limited not by data availability, but by the human capacity to transform that data into features.

Traditional prediction pipeline

  • Flatten 10+ tables into one row per entity
  • Human selects 100-200 features from 10,000+ candidates
  • 12.3 hours and 878 lines of code per task
  • Model sees 2% of available signal
  • New prediction task = new pipeline from scratch

Foundation model prediction

  • Model reads all tables directly as a graph
  • Model explores full feature space automatically
  • 1 second, 1 line of PQL per task
  • Model sees 100% of relational structure
  • New prediction task = new query, same model

PQL Query

PREDICT claims.status = 'Fraudulent'
FOR EACH policyholders.policyholder_id

The model reads policyholders, claims, and adjusters as a graph. It discovers that Derek's escalating claim amounts, decreasing adjuster payout ratios, and 3-claim-in-5-months cadence produce a high fraud signal.

Output

policyholder_idfraud_probabilitytop_signal
PH-3010.06Long tenure, first multi-policy claims
PH-3020.38Auto theft on short-tenure policy
PH-3030.0211-year tenure, no claims history
PH-3040.84Escalating collision amounts, 3 in 5 months

The foundation model approach

The insight behind relational foundation models is that the prediction bottleneck is not the model. It is the data transformation. If you eliminate the transformation, the prediction becomes trivial.

KumoRFM represents your database as a temporal heterogeneous graph. Rows become nodes. Foreign keys become edges. Timestamps establish ordering. The model traverses this graph to find predictive patterns, including multi-hop relationships, temporal sequences, and structural signatures that no human would enumerate.

Because the model is pre-trained on billions of relational patterns across thousands of databases, it already understands the universal dynamics that recur in business data: recency effects, frequency patterns, seasonal cycles, network propagation. It does not need to learn these from scratch on your data. It recognizes them.

The interface is a query, not a pipeline. "For each customer, what is the probability of churn in the next 30 days?" The model reads your schema, builds the graph, traverses it, and returns predictions with cell-level explanations. One query. One second. No feature engineering. No training pipeline.

Real-world results

The claims are backed by production deployments at scale.

DoorDash deployed relational predictions across 30 million users for restaurant and content recommendations. The result: a 1.8% engagement lift over their existing recommendation system, which was already highly optimized. At DoorDash's scale, 1.8% translates to millions of additional orders per quarter.

Snowflake used the same approach to predict which free-tier users would convert to paid plans and which existing customers would expand their usage. The result: a 3.2x expansion revenue lift by targeting the right accounts with the right timing.

Reddit applied relational predictions to content recommendations, leveraging the full graph of users, communities, posts, comments, and interactions. The model found engagement patterns in multi-hop paths (user → community → post → commenters → other communities) that their previous system could not express.

Databricks measured a 5.4x conversion lift using relational predictions for their sales pipeline, identifying which trial users were most likely to convert based on usage patterns, team dynamics, and integration activity.

On the RelBench benchmark

Production case studies are compelling but hard to reproduce. That is why the RelBench benchmark exists: 7 databases, 30 prediction tasks, 103 million+ rows, temporal train/test splits. On this benchmark:

  • LightGBM with manual features: 62.44 average AUROC on classification
  • LLM on serialized tables (Llama 3.2 3B): 68.06 AUROC
  • Task-specific GNN: 75.83 AUROC
  • KumoRFM zero-shot: 76.71 AUROC
  • KumoRFM fine-tuned: 81.14 AUROC

The pattern is clear. The more relational context a model can consume, the better its predictions. LightGBM sees a flat table. The LLM sees serialized text. The GNN sees the graph structure. KumoRFM sees the graph structure plus universal relational patterns learned from pre-training. Each step up in context produces a measurable jump in accuracy.

What this means for prediction strategy

If your organization treats AI prediction as a pipeline-building exercise, you are leaving accuracy and speed on the table. The pipeline approach caps your model's performance at whatever signal a human can manually extract from the database. The foundation model approach removes that cap.

The practical implication is speed. When a business stakeholder asks "can we predict X?", the answer should not be "let me scope a 3-month project." It should be "let me run that query." Every prediction task that takes months to deliver is a decision that was made without data for months. That cost is invisible but real.

The data for better predictions already exists in your database. The question is whether your prediction infrastructure can actually use it.

Frequently asked questions

What is AI prediction?

AI prediction is the use of machine learning models to forecast future outcomes from historical data. It encompasses classification (will this customer churn: yes or no?), regression (what will this customer spend next quarter?), recommendation (which products should we show this user?), and time-series forecasting (what will demand look like in Q3?). The quality of any AI prediction depends on two things: the model architecture and how much of the relevant data the model can actually see.

How is AI prediction different from traditional statistical forecasting?

Traditional statistical methods (ARIMA, exponential smoothing, logistic regression) work well on structured, single-table data with clear linear relationships. AI prediction methods, particularly deep learning, can capture non-linear patterns, interactions between hundreds of variables, and multi-hop relationships across interconnected data sources. The gap widens as data complexity increases: on simple, single-table problems the difference is small; on multi-table relational data it is substantial.

What types of business predictions can AI make?

Four main types: (1) Classification: binary or multi-class outcomes like churn, fraud, conversion, and default risk. (2) Regression: continuous values like customer lifetime value, expected revenue, and credit scores. (3) Recommendation: ranking entities like products, content, or offers for a specific user. (4) Forecasting: time-dependent predictions like demand, inventory needs, and capacity planning. Foundation models like KumoRFM handle all four types from the same model.

Why does relational context improve AI predictions?

Most business data is stored across multiple connected tables: customers, orders, products, interactions, support tickets. Traditional ML requires flattening this into a single table, losing cross-table relationships and temporal sequences. Models that learn directly from the relational structure (representing the database as a graph) see patterns that span 3-4 tables deep, like a customer's churn risk being linked to the return rates of products they bought. This consistently produces 10-30% AUROC improvements over flat-table approaches.

What is the feature engineering bottleneck in AI prediction?

Before a traditional ML model can make predictions, someone must manually transform multi-table data into a flat feature matrix: writing SQL joins, computing aggregations, and selecting time windows. A Stanford study measured this at 12.3 hours and 878 lines of code per prediction task. This bottleneck is why most prediction projects take months to deliver. Foundation models eliminate it by learning directly from raw relational tables, reducing the time from hours to seconds.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.