Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Predictive Analytics: What It Actually Is and Why Most Implementations Fail

Most companies buy predictive analytics tools and still can't predict next quarter's demand. The problem isn't the tool. It's that enterprise data doesn't fit the flat-table format these tools require.

TL;DR

  • 185% of analytics projects fail (Gartner) and 87% of ML models never reach production (VentureBeat). The root cause is the data preparation gap, not the math.
  • 2Enterprise data lives across 10-50 relational tables, but every predictive tool requires a single flat table. Bridging that gap takes 12.3 hours and 878 lines of code per task.
  • 3Flattening destroys three signal types worth hundreds of millions at enterprise scale: multi-hop relationships, temporal sequences, and graph topology.
  • 4KumoRFM outperforms manual feature engineering by 14+ AUROC points on RelBench (76.71 vs 62.44) by reading relational databases directly as graphs.
  • 5The shift is from pipeline-building to question-asking: one PQL query replaces months of feature engineering, and the same model handles any prediction task.

A mid-size retailer with 1,200 stores has five years of transaction data across 12 tables: point-of-sale records, inventory movements, supplier lead times, weather data by region, promotional calendars, loyalty program activity, returns, store-level staffing, and four more. They bought a predictive analytics platform to forecast demand by SKU by store by week. Eighteen months later, they are still running the same spreadsheet-based forecasts their planning team built in 2019.

The platform works. It can train models, tune hyperparameters, and generate forecasts. The problem is that it needs a single flat table as input: one row per SKU-store-week, with every relevant signal pre-computed as a column. Building that table from 12 source tables requires a team of three data engineers working for months. Every time the business asks a new question, the pipeline needs to be rebuilt.

This story repeats across industries. Gartner has estimated that 85% of analytics projects fail to deliver on their stated goals. The failure is rarely in the math. It is in the gap between how data is stored and how analytics tools consume it.

What predictive analytics actually means

Predictive analytics is the use of historical data, statistical algorithms, and machine learning to estimate the probability of future outcomes. The term covers a range of techniques, from simple linear regression to deep neural networks, applied to a common goal: answering "what will happen next" rather than "what happened."

To understand where predictive analytics fits, it helps to see the three levels of analytics maturity.

Descriptive analytics: what happened

Dashboards, reports, and KPIs. Last quarter's revenue was $47M. Churn rate was 4.2%. Average order value increased 8%. This is where most organizations spend most of their analytics effort. It is necessary but not sufficient, because knowing what happened does not tell you what to do about it.

Predictive analytics: what will happen

Using patterns in historical data to forecast future events. This customer has a 73% probability of churning in the next 30 days. Demand for SKU #4491 at the Chicago store will be 340 units next week. This transaction has a 91% probability of being fraudulent. Predictive analytics shifts decisions from reactive to proactive.

Prescriptive analytics: what should we do

Optimization on top of predictions. If we offer this customer a 15% discount, their churn probability drops to 31% and the expected lifetime value gain is $2,400. Prescriptive analytics requires a working predictive layer as input. Most organizations never get here because they cannot get predictions working reliably.

The real-world applications that work

Predictive analytics delivers the most value when three conditions are met: the prediction has a clear business action attached, the data history is long enough to contain patterns, and the entity being predicted has enough volume for statistical significance.

Churn prediction

Telecom, SaaS, and subscription businesses use churn models to identify customers likely to cancel. T-Mobile reported reducing churn by 50% using predictive models on customer interaction data. The key signal often comes from behavioral sequences (declining usage, support ticket spikes, payment delays) rather than static attributes.

Demand forecasting

Walmart processes 2.5 petabytes of data per hour across its supply chain. Their demand forecasting models integrate point-of-sale data, weather, local events, and economic indicators to predict store-level demand. A 1% improvement in forecast accuracy for a retailer of Walmart's scale can translate to hundreds of millions in reduced inventory waste.

Here is what the underlying data looks like for a mid-size retailer. Three tables drive demand prediction: stores, products, and daily sales.

stores

store_idcityregionsqftopened
S101ChicagoMidwest42,0002018-03-15
S102HoustonSouth38,5002019-07-22
S103PortlandWest29,0002021-01-10

products

skunamecategorypricesupplier
P4491Organic Oat Milk 64ozDairy Alt$5.99Oatly
P4492Almond Milk 64ozDairy Alt$4.49Blue Diamond
P4493Soy Milk 32ozDairy Alt$3.29Silk

daily_sales

datestore_idskuunits_soldpromo_active
2025-12-01S101P449187No
2025-12-02S101P4491134Yes
2025-12-03S101P4491142Yes
2025-12-04S101P449161No
2025-12-01S102P449143No
2025-12-02S102P449139No

Highlighted: promo-driven demand spike in Chicago. Houston shows no promo and lower baseline. A flat model sees 'units_sold' but misses the store-region-promo interaction.

Fraud detection

JPMorgan Chase processes $10 trillion in payments annually. Their fraud detection systems score transactions in real time, flagging anomalies based on transaction patterns, merchant history, and network-level signals. The challenge is that fraud patterns evolve constantly, requiring models that adapt without monthly retraining cycles.

Customer lifetime value

Starbucks uses predictive models to estimate the long-term value of each loyalty program member, then allocates marketing spend accordingly. The models incorporate purchase frequency, category preferences, store visit patterns, and response to past promotions. The signal spans multiple tables and multiple time horizons.

Why most implementations fail

The failure rate of predictive analytics projects is strikingly high. Beyond the Gartner 85% figure, VentureBeat reported that 87% of data science projects never make it to production. These are not small organizations. These are companies with dedicated data teams, modern infrastructure, and significant budgets.

The root cause is almost always the same: the data preparation bottleneck.

The flat table requirement

Every major predictive analytics tool, from scikit-learn to SAS to DataRobot, requires input in the form of a flat table. One row per entity, one column per feature. But enterprise data is not flat. It lives in relational databases with 10 to 50 interconnected tables linked by foreign keys.

Converting relational data to a flat table requires: deciding which tables to join, choosing aggregation functions (sum, count, average, max), selecting time windows (7 days, 30 days, 90 days), handling missing values, and encoding categorical variables. A Stanford study measured this process at 12.3 hours and 878 lines of code per prediction task for experienced data scientists.

The information loss problem: three signals that get destroyed

Flattening relational data into a single table destroys three specific types of signal. Each one is worth understanding, because they are often the difference between a model that catches 60% of churn and one that catches 85%. For a bank with 10 million customers, that 25-point gap is hundreds of millions in retained revenue.

Signal 1: Multi-hop relationships

Consider a bank trying to predict which customers will default on their credit card. Here is the actual data across three tables:

customers

customer_idnamecredit_limitaccount_agestatus
C001Sarah Chen$15,0004 yearsCurrent
C002James Wilson$8,0002 yearsCurrent
C003Maria Lopez$12,0003 yearsDefaulted
C004David Kim$10,0001 yearDefaulted

C003 and C004 both defaulted. The question: can we predict C001 or C002 will follow?

transactions

txn_idcustomer_idmerchant_idamountdate
T1001C001M50$2472025-01-15
T1002C001M51$892025-02-03
T1003C002M50$3122025-01-22
T1004C002M52$452025-02-10
T1005C003M50$1982024-10-05
T1006C003M51$1562024-11-12
T1007C004M50$2672024-09-18
T1008C004M51$1342024-10-30

Look at the merchant IDs. C003 and C004 (who defaulted) both shopped at M50 and M51. C001 also shops at M50 and M51. C002 shops at M50 but M52, not M51.

merchants

merchant_idnamecategoryrisk_score
M50QuickCash AdvanceCash ServicesHigh
M51EZ Pawn & JewelryPawn ShopHigh
M52Whole Foods MarketGroceryLow

Now follow the multi-hop path. C001 shops at M50 (QuickCash Advance) and M51 (EZ Pawn). Both of these merchants are also frequented by C003 and C004, who defaulted. The path is: C001 → transactions → merchants (M50, M51) → transactions (of C003, C004) → default status. Four hops.

Now here is what a data scientist actually builds. They flatten these three tables into a single row per customer:

flat_feature_table (what XGBoost sees)

customer_idtxn_countavg_amountcredit_utilaccount_age_yrs
C0012$168.0062%4
C0022$178.5055%2
C0032$177.0081%3
C0042$200.5078%1

All four customers have txn_count = 2 and similar avg_amount. The flat table cannot distinguish C001 (shops at same high-risk merchants as defaulters) from C002 (shops at Whole Foods). The multi-hop signal is gone.

Look at C001 and C002. Both have txn_count = 2 and similar average amounts. The flat table gives no indication that C001 shares both merchants with defaulted customers while C002 shares only one. That merchant-overlap signal, the strongest predictor of default risk, is invisible. It was destroyed during flattening because it requires traversing four hops across three tables.

No data scientist would write this feature. Not because the SQL is hard, but because you would need to imagine that merchant-overlap with defaulted customers is predictive, then write a 4-way join to test it. Across thousands of possible multi-hop paths, humans explore a tiny fraction. On the RelBench benchmark, models that traverse these paths automatically outperform flat-table models by 14 AUROC points on average (76.71 vs 62.44).

Signal 2: Temporal sequences within aggregation windows

Two customers each placed 5 orders in 30 days. Same count. Completely different stories:

orders: Customer A (disengaging)

order_iddateamountdays_since_prev
O201Mar 1$89-
O202Mar 2$671
O203Mar 3$451
O204Mar 5$342
O205Mar 7$222

5 orders in the first week, then silence for 23 days. Declining amounts. This customer is disengaging.

orders: Customer B (accelerating)

order_iddateamountdays_since_prev
O301Mar 1$34-
O302Mar 8$457
O303Mar 15$677
O304Mar 22$897
O305Mar 29$1127

1 order per week, steady cadence, increasing amounts. This customer is accelerating.

Now here is what the data scientist builds:

flat_feature_table (what the model sees)

customerorder_count_30davg_order_valuetotal_spendreality
Customer A5$51.40$257Disengaging (churn risk)
Customer B5$69.40$347Accelerating (growth)

The count and average look similar. The reality is opposite. Customer A crammed 5 declining orders into week 1 then disappeared. Customer B placed 1 order per week with rising amounts. The flat table erased the sequence.

The flat table shows order_count_30d = 5 for both. Customer A is a churn risk. Customer B is a growth opportunity. A model trained on this flat table cannot tell the difference because the aggregation function (count) collapsed the temporal dimension. The when and the trajectory are gone.

To recover this signal manually, you would need to engineer: week-1 count, week-2 count, week-3 count, week-4 count, order value trend slope, inter-order interval trend, acceleration/deceleration flag, last-7-day vs first-7-day ratio. That is 8+ features to recover what a model operating on raw transaction sequences sees natively in the timestamps and amounts.

Signal 3: Graph topology

Two accounts each have exactly 4 counterparties. Same flat feature. Completely different network structures:

transfers: Account X (normal business)

fromtoamountpattern
Acct XVendor A$5,200Monthly supplier payment
Acct XVendor B$3,100Monthly supplier payment
Acct XVendor C$8,400Quarterly contract
Acct XPayroll Co$12,000Bi-weekly payroll

Account X sends money to 4 independent vendors. None of the vendors transact with each other. Star topology: one hub, four spokes.

transfers: Account Y (suspected laundering ring)

fromtoamountpattern
Acct YShell Co 1$4,900Just under $5K reporting threshold
Shell Co 1Shell Co 2$4,800Next day, minus fee
Shell Co 2Shell Co 3$4,700Next day, minus fee
Shell Co 3Acct Y$4,600Circular return after 3 days

Account Y's 4 counterparties form a cycle: Y \u2192 Shell 1 \u2192 Shell 2 \u2192 Shell 3 \u2192 Y. Money flows in a circle. Classic layering pattern.

Now here is what the compliance team's flat feature table shows:

flat_feature_table (what the AML model sees)

accountunique_counterpartiestotal_outflowavg_txn_sizereality
Acct X4$28,700$7,175Normal business
Acct Y4$19,000$4,750Money laundering ring

Both accounts have 4 counterparties and reasonable transaction sizes. Account Y actually has smaller transactions. The flat table gives no indication that Y's money flows in a circle through shell companies while X's goes to independent vendors.

Both accounts show unique_counterparties = 4. Account Y actually looks less suspicious than Account X in the flat table: smaller average transaction size, lower total outflow. But Account Y is a laundering ring where money flows in a circle: Y → Shell 1 → Shell 2 → Shell 3 → back to Y. Account X is a normal business paying independent vendors.

The difference is the shape of the connections. Are the 4 counterparties connected to each other (a ring)? Or independent (a star)? A flat table reduces the entire network to a single number. Graph-based models see the topology natively: they detect cycles, measure clustering coefficients, and identify tightly-connected communities that flat-table models are structurally blind to.

The recurring cost problem

Every new prediction question requires a new feature engineering effort. Want to predict churn? Build a feature table. Want to predict upsell? Build a different feature table. Want to predict fraud? A third feature table. The work does not compound. Each question starts from scratch.

This is why organizations with a $2M annual data science budget might deliver 3 to 5 predictive models per year. The models themselves take days to build. The feature engineering takes months.

Why most projects stall

  • Data lives across 10-50 relational tables
  • Tools require a single flat input table
  • Feature engineering takes 80% of project time
  • Every new question restarts the process
  • Multi-hop and temporal signals are lost in aggregation

What changes with foundation models

  • Model reads relational tables directly
  • No flat table or feature engineering required
  • Multi-hop and temporal patterns discovered automatically
  • Same model answers any prediction question
  • Time to first prediction: seconds, not months

The technology gap that caused this

Machine learning was built on flat data. Linear regression, random forests, gradient boosting, and even early deep learning architectures all assume tabular input: one row per sample, one column per feature. This assumption was baked into every major ML framework (scikit-learn, TensorFlow, PyTorch) and every AutoML platform built on top of them.

Relational databases, meanwhile, were designed around normalization: spreading information across multiple tables to avoid redundancy and maintain integrity. A well-designed database is the opposite of a flat table. It is a graph of interconnected entities.

For 30 years, data science bridged this gap manually. The entire discipline of feature engineering exists because ML models cannot read relational data natively. The tools got faster (Spark, Dask, distributed SQL), but the fundamental mismatch remained.

What a foundation model does with this data

Instead of manually computing features like “avg_units_sold_7d” and “promo_lift_ratio” across stores and products, a foundation model reads all three tables directly and discovers the cross-table patterns that drive demand.

PQL Query

PREDICT SUM(daily_sales.units_sold, 0, 7)
FOR EACH products.sku, stores.store_id

This single query replaces weeks of feature engineering. The model discovers promo lift patterns, regional seasonality, and supplier-driven substitution effects automatically.

Output

skustore_idpredicted_units_7dtop_signal
P4491S101614Promo calendar + regional trend
P4491S102289Baseline demand, no promo scheduled
P4491S103178Smaller store, lower category penetration
P4492S101402Cross-category substitution from P4491

How foundation models change the equation

The breakthrough came from treating a relational database as what it actually is: a graph. Every row is a node. Every foreign key is an edge. Timestamps create a temporal dimension. A database with 12 tables becomes a temporal heterogeneous graph with millions of nodes and edges.

Relational Deep Learning, published at ICML 2024 by researchers at Stanford and Kumo.ai, demonstrated that graph neural networks trained directly on this structure outperform manual feature engineering. On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), the GNN approach beat a Stanford-trained data scientist's manual features on 11 of 12 classification tasks.

KumoRFM, a relational foundation model, pushes this further. Pre-trained on billions of rows across thousands of diverse relational databases, it has learned the universal patterns that recur in relational data: recency effects, frequency signals, temporal dynamics, graph topology, cross-table propagation. At inference time, you point it at your database and describe your prediction task in one line of PQL (Predictive Query Language). No training required.

The numbers are striking. On RelBench classification tasks, KumoRFM zero-shot achieves 76.71 AUROC, compared to 62.44 for LightGBM with manual feature engineering. Fine-tuning pushes KumoRFM to 81.14. The model that required zero human effort outperforms the approach that takes weeks.

What this means for your analytics strategy

If you are evaluating predictive analytics tools, the question is no longer "which model is best." XGBoost, LightGBM, and random forests are all good enough for most tabular prediction tasks. The question is: how does the tool handle your actual data?

If your data lives in a single flat table (a CSV export, a clean data warehouse table), traditional tools work fine. Pick one, train a model, deploy it.

If your data lives across multiple relational tables, which is the case for virtually every enterprise, the feature engineering step is where your project will succeed or fail. You have three options: staff a team to engineer features manually (expensive, slow, lossy), use automated feature generation tools like Featuretools (better but still limited to pre-programmed patterns), or use a model that reads relational data natively.

The third option did not exist two years ago. It does now. And for mission-critical predictions where every percentage point of accuracy translates to real revenue, the choice is not close. KumoRFM outperforms manual feature engineering by 14+ AUROC points on average across 30 benchmark tasks, with zero human effort. For a Fortune 500 company running 50 prediction models, the compound advantage is staggering: instead of a team of 20 data scientists spending 12 months on 5 models, you get 50 models in a week, each more accurate because the foundation model explores the full relational feature space that no human team can enumerate.

This matters most at enterprise scale. When you process millions of transactions per day, a 5% improvement in fraud detection is $50M in annual savings. When you serve 30 million customers, a 2% improvement in churn prediction is $200M in retained revenue. These are the stakes that justify a fundamentally different approach to predictive analytics, and they are exactly the scenarios where KumoRFM's advantage over traditional methods is largest.

The retailer from the opening example does not need a better analytics platform. They need a model that can read 12 tables directly, discover the cross-table patterns that drive demand, and deliver forecasts without a six-month feature engineering project. That technology exists.

Frequently asked questions

What is predictive analytics?

Predictive analytics uses historical data, statistical algorithms, and machine learning to forecast future outcomes. It goes beyond descriptive analytics (what happened) to answer what will happen next. Common applications include churn prediction, demand forecasting, fraud detection, and customer lifetime value estimation. The accuracy depends heavily on how well the underlying data is prepared and connected.

Why do most predictive analytics implementations fail?

The primary failure mode is the data preparation bottleneck. Enterprise data lives across 10 to 50 relational tables, and most predictive tools require a single flat table as input. Converting relational data to flat tables takes 80% of project time, introduces information loss, and creates a recurring cost for every new prediction question. Gartner estimated that 85% of analytics projects fail to deliver results.

What is the difference between predictive and prescriptive analytics?

Predictive analytics forecasts what will happen (this customer has a 73% probability of churning). Prescriptive analytics recommends what to do about it (offer this customer a 15% discount on their next order). Prescriptive systems build on predictive outputs by adding optimization or decision logic. Most organizations struggle with the predictive layer and never reach prescriptive capabilities.

What tools are used for predictive analytics?

Traditional tools include SAS, SPSS, Python scikit-learn, R, and cloud platforms like AWS SageMaker and Google Vertex AI. AutoML platforms like DataRobot and H2O automate model selection. All of these require pre-engineered flat feature tables. Foundation models like KumoRFM represent a new category that works directly on relational databases without feature engineering.

How do foundation models improve predictive analytics?

Foundation models for relational data, like KumoRFM, eliminate the feature engineering step that causes most predictive analytics projects to fail or stall. They read multi-table relational databases directly, discover cross-table patterns automatically, and deliver predictions in seconds rather than weeks. On the RelBench benchmark, KumoRFM achieved 76.71 AUROC zero-shot versus 62.44 for LightGBM with manually engineered features.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.