What is predictive analytics?

Predictive analytics uses historical data, statistical algorithms, and machine learning to forecast future outcomes. It goes beyond descriptive analytics (what happened) to answer what will happen next. Common applications include churn prediction, demand forecasting, fraud detection, and customer lifetime value estimation. The accuracy depends heavily on how well the underlying data is prepared and connected.

Why do most predictive analytics implementations fail?

The primary failure mode is the data preparation bottleneck. Enterprise data lives across 10 to 50 relational tables, and most predictive tools require a single flat table as input. Converting relational data to flat tables takes 80% of project time, introduces information loss, and creates a recurring cost for every new prediction question. Gartner estimated that 85% of analytics projects fail to deliver results.

What is the difference between predictive and prescriptive analytics?

Predictive analytics forecasts what will happen (this customer has a 73% probability of churning). Prescriptive analytics recommends what to do about it (offer this customer a 15% discount on their next order). Prescriptive systems build on predictive outputs by adding optimization or decision logic. Most organizations struggle with the predictive layer and never reach prescriptive capabilities.

What tools are used for predictive analytics?

Traditional tools include SAS, SPSS, Python scikit-learn, R, and cloud platforms like AWS SageMaker and Google Vertex AI. AutoML platforms like DataRobot and H2O automate model selection. All of these require pre-engineered flat feature tables. Foundation models like KumoRFM represent a new category that works directly on relational databases without feature engineering.

How do foundation models improve predictive analytics?

Foundation models for relational data, like KumoRFM, eliminate the feature engineering step that causes most predictive analytics projects to fail or stall. They read multi-table relational databases directly, discover cross-table patterns automatically, and deliver predictions in seconds rather than weeks. On the RelBench benchmark, KumoRFM achieved 76.71 AUROC zero-shot versus 62.44 for LightGBM with manually engineered features.

Predictive Analytics: What It Actually Is and Why Most Implementations Fail | Kumo.ai

A mid-size retailer with 1,200 stores has five years of transaction data across 12 tables: point-of-sale records, inventory movements, supplier lead times, weather data by region, promotional calendars, loyalty program activity, returns, store-level staffing, and four more. They bought a predictive analytics platform to forecast demand by SKU by store by week. Eighteen months later, they are still running the same spreadsheet-based forecasts their planning team built in 2019.

The platform works. It can train models, tune hyperparameters, and generate forecasts. The problem is that it needs a single flat table as input: one row per SKU-store-week, with every relevant signal pre-computed as a column. Building that table from 12 source tables requires a team of three data engineers working for months. Every time the business asks a new question, the pipeline needs to be rebuilt.

This story repeats across industries. Gartner has estimated that 85% of analytics projects fail to deliver on their stated goals. The failure is rarely in the math. It is in the gap between how data is stored and how analytics tools consume it.

What predictive analytics actually means

Predictive analytics is the use of historical data, statistical algorithms, and machine learning to estimate the probability of future outcomes. The term covers a range of techniques, from simple linear regression to deep neural networks, applied to a common goal: answering "what will happen next" rather than "what happened."

To understand where predictive analytics fits, it helps to see the three levels of analytics maturity.

Descriptive analytics: what happened

Dashboards, reports, and KPIs. Last quarter's revenue was $47M. Churn rate was 4.2%. Average order value increased 8%. This is where most organizations spend most of their analytics effort. It is necessary but not sufficient, because knowing what happened does not tell you what to do about it.

Predictive analytics: what will happen

Using patterns in historical data to forecast future events. This customer has a 73% probability of churning in the next 30 days. Demand for SKU #4491 at the Chicago store will be 340 units next week. This transaction has a 91% probability of being fraudulent. Predictive analytics shifts decisions from reactive to proactive.

Prescriptive analytics: what should we do

Optimization on top of predictions. If we offer this customer a 15% discount, their churn probability drops to 31% and the expected lifetime value gain is $2,400. Prescriptive analytics requires a working predictive layer as input. Most organizations never get here because they cannot get predictions working reliably.

The real-world applications that work

Predictive analytics delivers the most value when three conditions are met: the prediction has a clear business action attached, the data history is long enough to contain patterns, and the entity being predicted has enough volume for statistical significance.

Churn prediction

Telecom, SaaS, and subscription businesses use churn models to identify customers likely to cancel. T-Mobile reported reducing churn by 50% using predictive models on customer interaction data. The key signal often comes from behavioral sequences (declining usage, support ticket spikes, payment delays) rather than static attributes.

Demand forecasting

Walmart processes 2.5 petabytes of data per hour across its supply chain. Their demand forecasting models integrate point-of-sale data, weather, local events, and economic indicators to predict store-level demand. A 1% improvement in forecast accuracy for a retailer of Walmart's scale can translate to hundreds of millions in reduced inventory waste.

Here is what the underlying data looks like for a mid-size retailer. Three tables drive demand prediction: stores, products, and daily sales.

stores

store_id	city	region	sqft	opened
S101	Chicago	Midwest	42,000	2018-03-15
S102	Houston	South	38,500	2019-07-22
S103	Portland	West	29,000	2021-01-10

products

sku	name	category	price	supplier
P4491	Organic Oat Milk 64oz	Dairy Alt	$5.99	Oatly
P4492	Almond Milk 64oz	Dairy Alt	$4.49	Blue Diamond
P4493	Soy Milk 32oz	Dairy Alt	$3.29	Silk

daily_sales

date	store_id	sku	units_sold	promo_active
2025-12-01	S101	P4491	87	No
2025-12-02	S101	P4491	134	Yes
2025-12-03	S101	P4491	142	Yes
2025-12-04	S101	P4491	61	No
2025-12-01	S102	P4491	43	No
2025-12-02	S102	P4491	39	No

Highlighted: promo-driven demand spike in Chicago. Houston shows no promo and lower baseline. A flat model sees 'units_sold' but misses the store-region-promo interaction.

Fraud detection

JPMorgan Chase processes $10 trillion in payments annually. Their fraud detection systems score transactions in real time, flagging anomalies based on transaction patterns, merchant history, and network-level signals. The challenge is that fraud patterns evolve constantly, requiring models that adapt without monthly retraining cycles.

Customer lifetime value

Starbucks uses predictive models to estimate the long-term value of each loyalty program member, then allocates marketing spend accordingly. The models incorporate purchase frequency, category preferences, store visit patterns, and response to past promotions. The signal spans multiple tables and multiple time horizons.

Why most implementations fail

The failure rate of predictive analytics projects is strikingly high. Beyond the Gartner 85% figure, VentureBeat reported that 87% of data science projects never make it to production. These are not small organizations. These are companies with dedicated data teams, modern infrastructure, and significant budgets.

The root cause is almost always the same: the data preparation bottleneck.

The flat table requirement

Every major predictive analytics tool, from scikit-learn to SAS to DataRobot, requires input in the form of a flat table. One row per entity, one column per feature. But enterprise data is not flat. It lives in relational databases with 10 to 50 interconnected tables linked by foreign keys.

Converting relational data to a flat table requires: deciding which tables to join, choosing aggregation functions (sum, count, average, max), selecting time windows (7 days, 30 days, 90 days), handling missing values, and encoding categorical variables. A Stanford study measured this process at 12.3 hours and 878 lines of code per prediction task for experienced data scientists.

The information loss problem: three signals that get destroyed

Flattening relational data into a single table destroys three specific types of signal. Each one is worth understanding, because they are often the difference between a model that catches 60% of churn and one that catches 85%. For a bank with 10 million customers, that 25-point gap is hundreds of millions in retained revenue.

Signal 1: Multi-hop relationships

Consider a bank trying to predict which customers will default on their credit card. Here is the actual data across three tables:

customers

customer_id	name	credit_limit	account_age	status
C001	Sarah Chen	$15,000	4 years	Current
C002	James Wilson	$8,000	2 years	Current
C003	Maria Lopez	$12,000	3 years	Defaulted
C004	David Kim	$10,000	1 year	Defaulted

C003 and C004 both defaulted. The question: can we predict C001 or C002 will follow?

transactions

txn_id	customer_id	merchant_id	amount	date
T1001	C001	M50	$247	2025-01-15
T1002	C001	M51	$89	2025-02-03
T1003	C002	M50	$312	2025-01-22
T1004	C002	M52	$45	2025-02-10
T1005	C003	M50	$198	2024-10-05
T1006	C003	M51	$156	2024-11-12
T1007	C004	M50	$267	2024-09-18
T1008	C004	M51	$134	2024-10-30

Look at the merchant IDs. C003 and C004 (who defaulted) both shopped at M50 and M51. C001 also shops at M50 and M51. C002 shops at M50 but M52, not M51.

merchants

merchant_id	name	category	risk_score
M50	QuickCash Advance	Cash Services	High
M51	EZ Pawn & Jewelry	Pawn Shop	High
M52	Whole Foods Market	Grocery	Low

Now follow the multi-hop path. C001 shops at M50 (QuickCash Advance) and M51 (EZ Pawn). Both of these merchants are also frequented by C003 and C004, who defaulted. The path is: C001 → transactions → merchants (M50, M51) → transactions (of C003, C004) → default status. Four hops.

Now here is what a data scientist actually builds. They flatten these three tables into a single row per customer:

flat_feature_table (what XGBoost sees)

customer_id	txn_count	avg_amount	credit_util	account_age_yrs
C001	2	$168.00	62%	4
C002	2	$178.50	55%	2
C003	2	$177.00	81%	3
C004	2	$200.50	78%	1

All four customers have txn_count = 2 and similar avg_amount. The flat table cannot distinguish C001 (shops at same high-risk merchants as defaulters) from C002 (shops at Whole Foods). The multi-hop signal is gone.

Look at C001 and C002. Both have txn_count = 2 and similar average amounts. The flat table gives no indication that C001 shares both merchants with defaulted customers while C002 shares only one. That merchant-overlap signal, the strongest predictor of default risk, is invisible. It was destroyed during flattening because it requires traversing four hops across three tables.

No data scientist would write this feature. Not because the SQL is hard, but because you would need to imagine that merchant-overlap with defaulted customers is predictive, then write a 4-way join to test it. Across thousands of possible multi-hop paths, humans explore a tiny fraction. On the RelBench benchmark, models that traverse these paths automatically outperform flat-table models by 14 AUROC points on average (76.71 vs 62.44).

Signal 2: Temporal sequences within aggregation windows

Two customers each placed 5 orders in 30 days. Same count. Completely different stories:

orders: Customer A (disengaging)

order_id	date	amount	days_since_prev
O201	Mar 1	$89	-
O202	Mar 2	$67	1
O203	Mar 3	$45	1
O204	Mar 5	$34	2
O205	Mar 7	$22	2

5 orders in the first week, then silence for 23 days. Declining amounts. This customer is disengaging.

orders: Customer B (accelerating)

order_id	date	amount	days_since_prev
O301	Mar 1	$34	-
O302	Mar 8	$45	7
O303	Mar 15	$67	7
O304	Mar 22	$89	7
O305	Mar 29	$112	7

1 order per week, steady cadence, increasing amounts. This customer is accelerating.

Now here is what the data scientist builds:

flat_feature_table (what the model sees)

customer	order_count_30d	avg_order_value	total_spend	reality
Customer A	5	$51.40	$257	Disengaging (churn risk)
Customer B	5	$69.40	$347	Accelerating (growth)

The count and average look similar. The reality is opposite. Customer A crammed 5 declining orders into week 1 then disappeared. Customer B placed 1 order per week with rising amounts. The flat table erased the sequence.

The flat table shows order_count_30d = 5 for both. Customer A is a churn risk. Customer B is a growth opportunity. A model trained on this flat table cannot tell the difference because the aggregation function (count) collapsed the temporal dimension. The when and the trajectory are gone.

To recover this signal manually, you would need to engineer: week-1 count, week-2 count, week-3 count, week-4 count, order value trend slope, inter-order interval trend, acceleration/deceleration flag, last-7-day vs first-7-day ratio. That is 8+ features to recover what a model operating on raw transaction sequences sees natively in the timestamps and amounts.

Signal 3: Graph topology

Two accounts each have exactly 4 counterparties. Same flat feature. Completely different network structures:

transfers: Account X (normal business)

from	to	amount	pattern
Acct X	Vendor A	$5,200	Monthly supplier payment
Acct X	Vendor B	$3,100	Monthly supplier payment
Acct X	Vendor C	$8,400	Quarterly contract
Acct X	Payroll Co	$12,000	Bi-weekly payroll

Account X sends money to 4 independent vendors. None of the vendors transact with each other. Star topology: one hub, four spokes.

transfers: Account Y (suspected laundering ring)

from	to	amount	pattern
Acct Y	Shell Co 1	$4,900	Just under $5K reporting threshold
Shell Co 1	Shell Co 2	$4,800	Next day, minus fee
Shell Co 2	Shell Co 3	$4,700	Next day, minus fee
Shell Co 3	Acct Y	$4,600	Circular return after 3 days

Account Y's 4 counterparties form a cycle: Y \u2192 Shell 1 \u2192 Shell 2 \u2192 Shell 3 \u2192 Y. Money flows in a circle. Classic layering pattern.

Now here is what the compliance team's flat feature table shows:

flat_feature_table (what the AML model sees)

account	unique_counterparties	total_outflow	avg_txn_size	reality
Acct X	4	$28,700	$7,175	Normal business
Acct Y	4	$19,000	$4,750	Money laundering ring

Both accounts have 4 counterparties and reasonable transaction sizes. Account Y actually has smaller transactions. The flat table gives no indication that Y's money flows in a circle through shell companies while X's goes to independent vendors.

Both accounts show unique_counterparties = 4. Account Y actually looks less suspicious than Account X in the flat table: smaller average transaction size, lower total outflow. But Account Y is a laundering ring where money flows in a circle: Y → Shell 1 → Shell 2 → Shell 3 → back to Y. Account X is a normal business paying independent vendors.

The difference is the shape of the connections. Are the 4 counterparties connected to each other (a ring)? Or independent (a star)? A flat table reduces the entire network to a single number. Graph-based models see the topology natively: they detect cycles, measure clustering coefficients, and identify tightly-connected communities that flat-table models are structurally blind to.

The recurring cost problem

Every new prediction question requires a new feature engineering effort. Want to predict churn? Build a feature table. Want to predict upsell? Build a different feature table. Want to predict fraud? A third feature table. The work does not compound. Each question starts from scratch.

This is why organizations with a $2M annual data science budget might deliver 3 to 5 predictive models per year. The models themselves take days to build. The feature engineering takes months.

Why most projects stall

Data lives across 10-50 relational tables
Tools require a single flat input table
Feature engineering takes 80% of project time
Every new question restarts the process
Multi-hop and temporal signals are lost in aggregation

What changes with foundation models

Model reads relational tables directly
No flat table or feature engineering required
Multi-hop and temporal patterns discovered automatically
Same model answers any prediction question
Time to first prediction: seconds, not months

The technology gap that caused this

Machine learning was built on flat data. Linear regression, random forests, gradient boosting, and even early deep learning architectures all assume tabular input: one row per sample, one column per feature. This assumption was baked into every major ML framework (scikit-learn, TensorFlow, PyTorch) and every AutoML platform built on top of them.

Relational databases, meanwhile, were designed around normalization: spreading information across multiple tables to avoid redundancy and maintain integrity. A well-designed database is the opposite of a flat table. It is a graph of interconnected entities.

For 30 years, data science bridged this gap manually. The entire discipline of feature engineering exists because ML models cannot read relational data natively. The tools got faster (Spark, Dask, distributed SQL), but the fundamental mismatch remained.

What a foundation model does with this data

Instead of manually computing features like “avg_units_sold_7d” and “promo_lift_ratio” across stores and products, a foundation model reads all three tables directly and discovers the cross-table patterns that drive demand.

PQL Query

PREDICT SUM(daily_sales.units_sold, 0, 7)
FOR EACH products.sku, stores.store_id

This single query replaces weeks of feature engineering. The model discovers promo lift patterns, regional seasonality, and supplier-driven substitution effects automatically.

Output

sku	store_id	predicted_units_7d	top_signal
P4491	S101	614	Promo calendar + regional trend
P4491	S102	289	Baseline demand, no promo scheduled
P4491	S103	178	Smaller store, lower category penetration
P4492	S101	402	Cross-category substitution from P4491

How foundation models change the equation

The breakthrough came from treating a relational database as what it actually is: a graph. Every row is a node. Every foreign key is an edge. Timestamps create a temporal dimension. A database with 12 tables becomes a temporal heterogeneous graph with millions of nodes and edges.

Relational Deep Learning, published at ICML 2024 by researchers at Stanford and Kumo.ai, demonstrated that graph neural networks trained directly on this structure outperform manual feature engineering. On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), the GNN approach beat a Stanford-trained data scientist's manual features on 11 of 12 classification tasks.

KumoRFM, a relational foundation model, pushes this further. Pre-trained on billions of rows across thousands of diverse relational databases, it has learned the universal patterns that recur in relational data: recency effects, frequency signals, temporal dynamics, graph topology, cross-table propagation. At inference time, you point it at your database and describe your prediction task in one line of PQL (Predictive Query Language). No training required.

The numbers are striking. On RelBench classification tasks, KumoRFM zero-shot achieves 76.71 AUROC, compared to 62.44 for LightGBM with manual feature engineering. Fine-tuning pushes KumoRFM to 81.14. The model that required zero human effort outperforms the approach that takes weeks.

What this means for your analytics strategy

If you are evaluating predictive analytics tools, the question is no longer "which model is best." XGBoost, LightGBM, and random forests are all good enough for most tabular prediction tasks. The question is: how does the tool handle your actual data?

If your data lives in a single flat table (a CSV export, a clean data warehouse table), traditional tools work fine. Pick one, train a model, deploy it.

If your data lives across multiple relational tables, which is the case for virtually every enterprise, the feature engineering step is where your project will succeed or fail. You have three options: staff a team to engineer features manually (expensive, slow, lossy), use automated feature generation tools like Featuretools (better but still limited to pre-programmed patterns), or use a model that reads relational data natively.

The third option did not exist two years ago. It does now. And for mission-critical predictions where every percentage point of accuracy translates to real revenue, the choice is not close. KumoRFM outperforms manual feature engineering by 14+ AUROC points on average across 30 benchmark tasks, with zero human effort. For a Fortune 500 company running 50 prediction models, the compound advantage is staggering: instead of a team of 20 data scientists spending 12 months on 5 models, you get 50 models in a week, each more accurate because the foundation model explores the full relational feature space that no human team can enumerate.

This matters most at enterprise scale. When you process millions of transactions per day, a 5% improvement in fraud detection is $50M in annual savings. When you serve 30 million customers, a 2% improvement in churn prediction is $200M in retained revenue. These are the stakes that justify a fundamentally different approach to predictive analytics, and they are exactly the scenarios where KumoRFM's advantage over traditional methods is largest.

The retailer from the opening example does not need a better analytics platform. They need a model that can read 12 tables directly, discover the cross-table patterns that drive demand, and deliver forecasts without a six-month feature engineering project. That technology exists.

Key Takeaways

185% of analytics projects fail to deliver (Gartner) and 87% of ML models never reach production (VentureBeat). The root cause is the gap between relational data storage and flat-table ML requirements.
2Flattening destroys three specific signal types: multi-hop relationships (customer → merchant → other customers), temporal sequences (order acceleration vs deceleration hidden by count aggregation), and graph topology (fraud ring structure invisible in flat tables).
3At enterprise scale, these destroyed signals have massive dollar impact. A 1% improvement in fraud detection at JPMorgan-scale ($10T annual payments) translates to $100M. A 1% improvement in retail demand forecasting reduces inventory waste by $300-500M.
4KumoRFM outperforms manual feature engineering by 14+ AUROC points on RelBench (76.71 vs 62.44) because it operates on the full relational graph, automatically discovering multi-hop and temporal patterns that no human team can enumerate.
5The competitive advantage compounds: while traditional teams deliver 3-5 models per year, a foundation model approach lets you test dozens of prediction tasks per week.

Predictive Analytics: What It Actually Is and Why Most Implementations Fail