A mid-size retailer with 1,200 stores has five years of transaction data across 12 tables: point-of-sale records, inventory movements, supplier lead times, weather data by region, promotional calendars, loyalty program activity, returns, store-level staffing, and four more. They bought a predictive analytics platform to forecast demand by SKU by store by week. Eighteen months later, they are still running the same spreadsheet-based forecasts their planning team built in 2019.
The platform works. It can train models, tune hyperparameters, and generate forecasts. The problem is that it needs a single flat table as input: one row per SKU-store-week, with every relevant signal pre-computed as a column. Building that table from 12 source tables requires a team of three data engineers working for months. Every time the business asks a new question, the pipeline needs to be rebuilt.
This story repeats across industries. Gartner has estimated that 85% of analytics projects fail to deliver on their stated goals. The failure is rarely in the math. It is in the gap between how data is stored and how analytics tools consume it.
What predictive analytics actually means
Predictive analytics is the use of historical data, statistical algorithms, and machine learning to estimate the probability of future outcomes. The term covers a range of techniques, from simple linear regression to deep neural networks, applied to a common goal: answering "what will happen next" rather than "what happened."
To understand where predictive analytics fits, it helps to see the three levels of analytics maturity.
Descriptive analytics: what happened
Dashboards, reports, and KPIs. Last quarter's revenue was $47M. Churn rate was 4.2%. Average order value increased 8%. This is where most organizations spend most of their analytics effort. It is necessary but not sufficient, because knowing what happened does not tell you what to do about it.
Predictive analytics: what will happen
Using patterns in historical data to forecast future events. This customer has a 73% probability of churning in the next 30 days. Demand for SKU #4491 at the Chicago store will be 340 units next week. This transaction has a 91% probability of being fraudulent. Predictive analytics shifts decisions from reactive to proactive.
Prescriptive analytics: what should we do
Optimization on top of predictions. If we offer this customer a 15% discount, their churn probability drops to 31% and the expected lifetime value gain is $2,400. Prescriptive analytics requires a working predictive layer as input. Most organizations never get here because they cannot get predictions working reliably.
The real-world applications that work
Predictive analytics delivers the most value when three conditions are met: the prediction has a clear business action attached, the data history is long enough to contain patterns, and the entity being predicted has enough volume for statistical significance.
Churn prediction
Telecom, SaaS, and subscription businesses use churn models to identify customers likely to cancel. T-Mobile reported reducing churn by 50% using predictive models on customer interaction data. The key signal often comes from behavioral sequences (declining usage, support ticket spikes, payment delays) rather than static attributes.
Demand forecasting
Walmart processes 2.5 petabytes of data per hour across its supply chain. Their demand forecasting models integrate point-of-sale data, weather, local events, and economic indicators to predict store-level demand. A 1% improvement in forecast accuracy for a retailer of Walmart's scale can translate to hundreds of millions in reduced inventory waste.
Here is what the underlying data looks like for a mid-size retailer. Three tables drive demand prediction: stores, products, and daily sales.
stores
| store_id | city | region | sqft | opened |
|---|---|---|---|---|
| S101 | Chicago | Midwest | 42,000 | 2018-03-15 |
| S102 | Houston | South | 38,500 | 2019-07-22 |
| S103 | Portland | West | 29,000 | 2021-01-10 |
products
| sku | name | category | price | supplier |
|---|---|---|---|---|
| P4491 | Organic Oat Milk 64oz | Dairy Alt | $5.99 | Oatly |
| P4492 | Almond Milk 64oz | Dairy Alt | $4.49 | Blue Diamond |
| P4493 | Soy Milk 32oz | Dairy Alt | $3.29 | Silk |
daily_sales
| date | store_id | sku | units_sold | promo_active |
|---|---|---|---|---|
| 2025-12-01 | S101 | P4491 | 87 | No |
| 2025-12-02 | S101 | P4491 | 134 | Yes |
| 2025-12-03 | S101 | P4491 | 142 | Yes |
| 2025-12-04 | S101 | P4491 | 61 | No |
| 2025-12-01 | S102 | P4491 | 43 | No |
| 2025-12-02 | S102 | P4491 | 39 | No |
Highlighted: promo-driven demand spike in Chicago. Houston shows no promo and lower baseline. A flat model sees 'units_sold' but misses the store-region-promo interaction.
Fraud detection
JPMorgan Chase processes $10 trillion in payments annually. Their fraud detection systems score transactions in real time, flagging anomalies based on transaction patterns, merchant history, and network-level signals. The challenge is that fraud patterns evolve constantly, requiring models that adapt without monthly retraining cycles.
Customer lifetime value
Starbucks uses predictive models to estimate the long-term value of each loyalty program member, then allocates marketing spend accordingly. The models incorporate purchase frequency, category preferences, store visit patterns, and response to past promotions. The signal spans multiple tables and multiple time horizons.
Why most implementations fail
The failure rate of predictive analytics projects is strikingly high. Beyond the Gartner 85% figure, VentureBeat reported that 87% of data science projects never make it to production. These are not small organizations. These are companies with dedicated data teams, modern infrastructure, and significant budgets.
The root cause is almost always the same: the data preparation bottleneck.
The flat table requirement
Every major predictive analytics tool, from scikit-learn to SAS to DataRobot, requires input in the form of a flat table. One row per entity, one column per feature. But enterprise data is not flat. It lives in relational databases with 10 to 50 interconnected tables linked by foreign keys.
Converting relational data to a flat table requires: deciding which tables to join, choosing aggregation functions (sum, count, average, max), selecting time windows (7 days, 30 days, 90 days), handling missing values, and encoding categorical variables. A Stanford study measured this process at 12.3 hours and 878 lines of code per prediction task for experienced data scientists.
The information loss problem: three signals that get destroyed
Flattening relational data into a single table destroys three specific types of signal. Each one is worth understanding, because they are often the difference between a model that catches 60% of churn and one that catches 85%. For a bank with 10 million customers, that 25-point gap is hundreds of millions in retained revenue.
Signal 1: Multi-hop relationships
Consider a bank trying to predict which customers will default on their credit card. Here is the actual data across three tables:
customers
| customer_id | name | credit_limit | account_age | status |
|---|---|---|---|---|
| C001 | Sarah Chen | $15,000 | 4 years | Current |
| C002 | James Wilson | $8,000 | 2 years | Current |
| C003 | Maria Lopez | $12,000 | 3 years | Defaulted |
| C004 | David Kim | $10,000 | 1 year | Defaulted |
C003 and C004 both defaulted. The question: can we predict C001 or C002 will follow?
transactions
| txn_id | customer_id | merchant_id | amount | date |
|---|---|---|---|---|
| T1001 | C001 | M50 | $247 | 2025-01-15 |
| T1002 | C001 | M51 | $89 | 2025-02-03 |
| T1003 | C002 | M50 | $312 | 2025-01-22 |
| T1004 | C002 | M52 | $45 | 2025-02-10 |
| T1005 | C003 | M50 | $198 | 2024-10-05 |
| T1006 | C003 | M51 | $156 | 2024-11-12 |
| T1007 | C004 | M50 | $267 | 2024-09-18 |
| T1008 | C004 | M51 | $134 | 2024-10-30 |
Look at the merchant IDs. C003 and C004 (who defaulted) both shopped at M50 and M51. C001 also shops at M50 and M51. C002 shops at M50 but M52, not M51.
merchants
| merchant_id | name | category | risk_score |
|---|---|---|---|
| M50 | QuickCash Advance | Cash Services | High |
| M51 | EZ Pawn & Jewelry | Pawn Shop | High |
| M52 | Whole Foods Market | Grocery | Low |
Now follow the multi-hop path. C001 shops at M50 (QuickCash Advance) and M51 (EZ Pawn). Both of these merchants are also frequented by C003 and C004, who defaulted. The path is: C001 → transactions → merchants (M50, M51) → transactions (of C003, C004) → default status. Four hops.
Now here is what a data scientist actually builds. They flatten these three tables into a single row per customer:
flat_feature_table (what XGBoost sees)
| customer_id | txn_count | avg_amount | credit_util | account_age_yrs |
|---|---|---|---|---|
| C001 | 2 | $168.00 | 62% | 4 |
| C002 | 2 | $178.50 | 55% | 2 |
| C003 | 2 | $177.00 | 81% | 3 |
| C004 | 2 | $200.50 | 78% | 1 |
All four customers have txn_count = 2 and similar avg_amount. The flat table cannot distinguish C001 (shops at same high-risk merchants as defaulters) from C002 (shops at Whole Foods). The multi-hop signal is gone.
Look at C001 and C002. Both have txn_count = 2 and similar average amounts. The flat table gives no indication that C001 shares both merchants with defaulted customers while C002 shares only one. That merchant-overlap signal, the strongest predictor of default risk, is invisible. It was destroyed during flattening because it requires traversing four hops across three tables.
No data scientist would write this feature. Not because the SQL is hard, but because you would need to imagine that merchant-overlap with defaulted customers is predictive, then write a 4-way join to test it. Across thousands of possible multi-hop paths, humans explore a tiny fraction. On the RelBench benchmark, models that traverse these paths automatically outperform flat-table models by 14 AUROC points on average (76.71 vs 62.44).
Signal 2: Temporal sequences within aggregation windows
Two customers each placed 5 orders in 30 days. Same count. Completely different stories:
orders: Customer A (disengaging)
| order_id | date | amount | days_since_prev |
|---|---|---|---|
| O201 | Mar 1 | $89 | - |
| O202 | Mar 2 | $67 | 1 |
| O203 | Mar 3 | $45 | 1 |
| O204 | Mar 5 | $34 | 2 |
| O205 | Mar 7 | $22 | 2 |
5 orders in the first week, then silence for 23 days. Declining amounts. This customer is disengaging.
orders: Customer B (accelerating)
| order_id | date | amount | days_since_prev |
|---|---|---|---|
| O301 | Mar 1 | $34 | - |
| O302 | Mar 8 | $45 | 7 |
| O303 | Mar 15 | $67 | 7 |
| O304 | Mar 22 | $89 | 7 |
| O305 | Mar 29 | $112 | 7 |
1 order per week, steady cadence, increasing amounts. This customer is accelerating.
Now here is what the data scientist builds:
flat_feature_table (what the model sees)
| customer | order_count_30d | avg_order_value | total_spend | reality |
|---|---|---|---|---|
| Customer A | 5 | $51.40 | $257 | Disengaging (churn risk) |
| Customer B | 5 | $69.40 | $347 | Accelerating (growth) |
The count and average look similar. The reality is opposite. Customer A crammed 5 declining orders into week 1 then disappeared. Customer B placed 1 order per week with rising amounts. The flat table erased the sequence.
The flat table shows order_count_30d = 5 for both. Customer A is a churn risk. Customer B is a growth opportunity. A model trained on this flat table cannot tell the difference because the aggregation function (count) collapsed the temporal dimension. The when and the trajectory are gone.
To recover this signal manually, you would need to engineer: week-1 count, week-2 count, week-3 count, week-4 count, order value trend slope, inter-order interval trend, acceleration/deceleration flag, last-7-day vs first-7-day ratio. That is 8+ features to recover what a model operating on raw transaction sequences sees natively in the timestamps and amounts.
Signal 3: Graph topology
Two accounts each have exactly 4 counterparties. Same flat feature. Completely different network structures:
transfers: Account X (normal business)
| from | to | amount | pattern |
|---|---|---|---|
| Acct X | Vendor A | $5,200 | Monthly supplier payment |
| Acct X | Vendor B | $3,100 | Monthly supplier payment |
| Acct X | Vendor C | $8,400 | Quarterly contract |
| Acct X | Payroll Co | $12,000 | Bi-weekly payroll |
Account X sends money to 4 independent vendors. None of the vendors transact with each other. Star topology: one hub, four spokes.
transfers: Account Y (suspected laundering ring)
| from | to | amount | pattern |
|---|---|---|---|
| Acct Y | Shell Co 1 | $4,900 | Just under $5K reporting threshold |
| Shell Co 1 | Shell Co 2 | $4,800 | Next day, minus fee |
| Shell Co 2 | Shell Co 3 | $4,700 | Next day, minus fee |
| Shell Co 3 | Acct Y | $4,600 | Circular return after 3 days |
Account Y's 4 counterparties form a cycle: Y \u2192 Shell 1 \u2192 Shell 2 \u2192 Shell 3 \u2192 Y. Money flows in a circle. Classic layering pattern.
Now here is what the compliance team's flat feature table shows:
flat_feature_table (what the AML model sees)
| account | unique_counterparties | total_outflow | avg_txn_size | reality |
|---|---|---|---|---|
| Acct X | 4 | $28,700 | $7,175 | Normal business |
| Acct Y | 4 | $19,000 | $4,750 | Money laundering ring |
Both accounts have 4 counterparties and reasonable transaction sizes. Account Y actually has smaller transactions. The flat table gives no indication that Y's money flows in a circle through shell companies while X's goes to independent vendors.
Both accounts show unique_counterparties = 4. Account Y actually looks less suspicious than Account X in the flat table: smaller average transaction size, lower total outflow. But Account Y is a laundering ring where money flows in a circle: Y → Shell 1 → Shell 2 → Shell 3 → back to Y. Account X is a normal business paying independent vendors.
The difference is the shape of the connections. Are the 4 counterparties connected to each other (a ring)? Or independent (a star)? A flat table reduces the entire network to a single number. Graph-based models see the topology natively: they detect cycles, measure clustering coefficients, and identify tightly-connected communities that flat-table models are structurally blind to.
The recurring cost problem
Every new prediction question requires a new feature engineering effort. Want to predict churn? Build a feature table. Want to predict upsell? Build a different feature table. Want to predict fraud? A third feature table. The work does not compound. Each question starts from scratch.
This is why organizations with a $2M annual data science budget might deliver 3 to 5 predictive models per year. The models themselves take days to build. The feature engineering takes months.
Why most projects stall
- Data lives across 10-50 relational tables
- Tools require a single flat input table
- Feature engineering takes 80% of project time
- Every new question restarts the process
- Multi-hop and temporal signals are lost in aggregation
What changes with foundation models
- Model reads relational tables directly
- No flat table or feature engineering required
- Multi-hop and temporal patterns discovered automatically
- Same model answers any prediction question
- Time to first prediction: seconds, not months
The technology gap that caused this
Machine learning was built on flat data. Linear regression, random forests, gradient boosting, and even early deep learning architectures all assume tabular input: one row per sample, one column per feature. This assumption was baked into every major ML framework (scikit-learn, TensorFlow, PyTorch) and every AutoML platform built on top of them.
Relational databases, meanwhile, were designed around normalization: spreading information across multiple tables to avoid redundancy and maintain integrity. A well-designed database is the opposite of a flat table. It is a graph of interconnected entities.
For 30 years, data science bridged this gap manually. The entire discipline of feature engineering exists because ML models cannot read relational data natively. The tools got faster (Spark, Dask, distributed SQL), but the fundamental mismatch remained.
What a foundation model does with this data
Instead of manually computing features like “avg_units_sold_7d” and “promo_lift_ratio” across stores and products, a foundation model reads all three tables directly and discovers the cross-table patterns that drive demand.
PQL Query
PREDICT SUM(daily_sales.units_sold, 0, 7) FOR EACH products.sku, stores.store_id
This single query replaces weeks of feature engineering. The model discovers promo lift patterns, regional seasonality, and supplier-driven substitution effects automatically.
Output
| sku | store_id | predicted_units_7d | top_signal |
|---|---|---|---|
| P4491 | S101 | 614 | Promo calendar + regional trend |
| P4491 | S102 | 289 | Baseline demand, no promo scheduled |
| P4491 | S103 | 178 | Smaller store, lower category penetration |
| P4492 | S101 | 402 | Cross-category substitution from P4491 |
How foundation models change the equation
The breakthrough came from treating a relational database as what it actually is: a graph. Every row is a node. Every foreign key is an edge. Timestamps create a temporal dimension. A database with 12 tables becomes a temporal heterogeneous graph with millions of nodes and edges.
Relational Deep Learning, published at ICML 2024 by researchers at Stanford and Kumo.ai, demonstrated that graph neural networks trained directly on this structure outperform manual feature engineering. On the RelBench benchmark (7 databases, 30 tasks, 103 million rows), the GNN approach beat a Stanford-trained data scientist's manual features on 11 of 12 classification tasks.
KumoRFM, a relational foundation model, pushes this further. Pre-trained on billions of rows across thousands of diverse relational databases, it has learned the universal patterns that recur in relational data: recency effects, frequency signals, temporal dynamics, graph topology, cross-table propagation. At inference time, you point it at your database and describe your prediction task in one line of PQL (Predictive Query Language). No training required.
The numbers are striking. On RelBench classification tasks, KumoRFM zero-shot achieves 76.71 AUROC, compared to 62.44 for LightGBM with manual feature engineering. Fine-tuning pushes KumoRFM to 81.14. The model that required zero human effort outperforms the approach that takes weeks.
What this means for your analytics strategy
If you are evaluating predictive analytics tools, the question is no longer "which model is best." XGBoost, LightGBM, and random forests are all good enough for most tabular prediction tasks. The question is: how does the tool handle your actual data?
If your data lives in a single flat table (a CSV export, a clean data warehouse table), traditional tools work fine. Pick one, train a model, deploy it.
If your data lives across multiple relational tables, which is the case for virtually every enterprise, the feature engineering step is where your project will succeed or fail. You have three options: staff a team to engineer features manually (expensive, slow, lossy), use automated feature generation tools like Featuretools (better but still limited to pre-programmed patterns), or use a model that reads relational data natively.
The third option did not exist two years ago. It does now. And for mission-critical predictions where every percentage point of accuracy translates to real revenue, the choice is not close. KumoRFM outperforms manual feature engineering by 14+ AUROC points on average across 30 benchmark tasks, with zero human effort. For a Fortune 500 company running 50 prediction models, the compound advantage is staggering: instead of a team of 20 data scientists spending 12 months on 5 models, you get 50 models in a week, each more accurate because the foundation model explores the full relational feature space that no human team can enumerate.
This matters most at enterprise scale. When you process millions of transactions per day, a 5% improvement in fraud detection is $50M in annual savings. When you serve 30 million customers, a 2% improvement in churn prediction is $200M in retained revenue. These are the stakes that justify a fundamentally different approach to predictive analytics, and they are exactly the scenarios where KumoRFM's advantage over traditional methods is largest.
The retailer from the opening example does not need a better analytics platform. They need a model that can read 12 tables directly, discover the cross-table patterns that drive demand, and deliver forecasts without a six-month feature engineering project. That technology exists.