You have probably heard some version of this pitch before: "Just add ML to your demand forecasting and accuracy goes up." It is technically true. But it skips the part that actually matters: why your forecasts are wrong in the first place.
The answer, for most companies, is not that their time series model is bad. It is that their model treats each product as if it exists in a vacuum. Product A's forecast does not know that Product B (its closest substitute) just went out of stock. It does not know that Supplier C is running two weeks late, which will affect 47 other SKUs in the same category. It does not know that a promotion on Product D will cannibalize 15% of Product A's demand.
These cross-product, cross-supplier, cross-store signals are where the accuracy gains live. And most ML approaches, including some very sophisticated ones, still miss them.
Why demand forecasting plateaus at 70%
The root cause is structural: most forecasting methods model each product's demand independently. You build a time series for SKU #4421, fit a model to its history, and project forward. Then you do the same for SKU #4422, and #4423, and all 50,000 SKUs in your catalog.
Each model sees its own history. None of them see the network of relationships that actually drives demand:
- Substitution effects. When SKU #4421 goes out of stock, demand does not disappear. It shifts to related products in the same category. A model forecasting SKU #4422 in isolation will not see this incoming demand spike until it shows up in the historical data, by which time you have already missed the replenishment window.
- Promotional lift propagation. A 20% discount on a hero SKU does not just affect that SKU. It pulls foot traffic to the category, lifts sales of complementary products, and cannibalizes competing brands on the same shelf. These ripple effects span 2-3 hops in the product-category-store graph.
- Supplier constraints. A delayed shipment from a single supplier can affect availability of dozens of SKUs across multiple categories and stores. The demand signal (or rather, the constraint on fulfillable demand) propagates through the supplier-product-store network.
- Store-level signals. Demand at Store #112 is not independent of demand at Store #113 across the street. They share a customer base, respond to the same local events, and compete for the same wallet. Regional demand shifts are visible in the store graph but invisible to per-product models.
Every one of these signals is relational. They live in the connections between products, categories, suppliers, stores, and promotions. A forecasting method that cannot read those connections will plateau, no matter how sophisticated its time series modeling gets.
6 approaches to demand forecasting, ranked
Here is how the major approaches compare across accuracy, effort, and what they can actually see in your data.
demand_forecasting_approaches_comparison
| approach | typical accuracy | setup effort | handles cross-product signals | best for |
|---|---|---|---|---|
| 1. Spreadsheets / Excel | 50-60% | Low | No | Small catalogs, early-stage companies, quick sanity checks |
| 2. Statistical methods (ARIMA, Prophet) | 60-65% | Low-Medium | No | Stable demand patterns, single-product forecasting, baselines |
| 3. XGBoost / LightGBM on flat features | 65-75% | Medium-High | Partially (with manual feature engineering) | Teams with data science capacity, tabular demand signals |
| 4. Time series foundation models (Chronos, TimesFM) | 70-80% | Medium | No | Temporal pattern recognition, many-SKU forecasting without per-model tuning |
| 5. Enterprise planning platforms (SAP IBP, Anaplan, Blue Yonder) | 70-80% | Very High (6-18 months) | Partially (rules-based, pre-configured) | Large enterprises with existing ERP investments, integrated S&OP |
| 6. Graph-based ML / KumoRFM | 85-91% | Low (1-3 weeks) | Yes - reads full relational structure automatically | Any company with relational data: product-supplier-store-promotion graphs |
Six approaches ranked by typical accuracy. The accuracy gap between approaches 1-5 and approach 6 comes from cross-product relational signals that only graph-based methods can read.
1. Spreadsheets and Excel
This is where most companies start, and honestly, it works fine for simple cases. Moving averages, seasonal indices, maybe a VLOOKUP to pull in last year's numbers. For a catalog of 50 products with stable demand, a well-maintained spreadsheet can hold its own.
It breaks down at scale. Once you have thousands of SKUs, dozens of stores, frequent promotions, and supplier variability, the spreadsheet becomes a maintenance nightmare. Formulas get copy-pasted wrong. Seasonal adjustments are applied inconsistently. Nobody trusts the numbers, so planners override them with gut feel anyway.
- Best for: Small catalogs under 100 SKUs with stable demand, early-stage companies, and quick sanity checks.
- Watch out for: Breaks down at scale. No cross-product signals. Planners override the numbers anyway, creating a false sense of process.
2. Statistical methods: ARIMA, Prophet
The classic upgrade from spreadsheets. ARIMA decomposes demand into trend, seasonality, and noise. Facebook's Prophet makes this easier to configure and handles holidays well. Both are solid for stable, repeating patterns.
The limitation: they see only one product's history at a time. They cannot incorporate external signals (promotions, weather, competitor actions) without significant custom work. And they assume the future looks like the past, which falls apart during disruptions, new product launches, or structural demand shifts.
- Best for: Stable, repeating demand patterns and single-product forecasting baselines.
- Watch out for: Sees only one product at a time. Cannot incorporate promotions, weather, or competitor actions without significant custom work.
3. XGBoost / LightGBM on flat features
This is the current workhorse for many data science teams. Engineer a flat feature table with columns for historical sales, price, promotions, day of week, weather, and any other signal you can flatten into a row. Train XGBoost or LightGBM to predict next-period demand.
It is genuinely better than statistical methods because it can incorporate non-temporal signals: price elasticity, promotional response, and interaction effects. The catch is feature engineering. You have to manually create every feature, and the most important signals (cross-product substitution, supplier effects) require joining and aggregating across multiple tables. Teams typically spend 12+ hours per prediction task on feature engineering alone. Even then, you are limited to the relationships you thought to encode. The signals you did not think of stay hidden.
- Best for: Teams with data science capacity that want to incorporate price elasticity, promotional response, and interaction effects.
- Watch out for: Feature engineering takes 12+ hours per prediction task. Cross-product substitution and supplier effects require complex multi-table joins that most teams give up on.
4. Time series foundation models: Chronos, TimesFM
A newer approach that is genuinely exciting. Chronos (from Amazon) and TimesFM (from Google) are pre-trained on massive collections of time series data. They recognize temporal patterns (trend shifts, seasonal shapes, level changes) without per-product model tuning.
For pure temporal forecasting, they are state of the art. If your accuracy gap is driven by complex seasonality or trend changes, these models will help. But they still operate on individual time series. They do not read the relational structure connecting products, stores, and suppliers. Substitution effects, promotional propagation, and supplier constraints remain invisible.
- Best for: Many-SKU forecasting with complex seasonal patterns, without per-model tuning. State of the art for pure temporal forecasting.
- Watch out for: Still operates on individual time series. Blind to substitution effects, promotional propagation, and supplier constraints across products.
5. Enterprise planning platforms: SAP IBP, Anaplan, Blue Yonder
These platforms bundle demand forecasting with broader supply chain planning: inventory optimization, S&OP, and production scheduling. They have built-in ML modules and can incorporate some cross-product signals through pre-configured rules and hierarchical forecasting.
The accuracy is decent (70-80% range), but the setup cost is brutal. Implementations run 6-18 months and require dedicated consultants. The ML components are often black boxes that you cannot inspect or customize. And the cross-product logic is typically rules-based (manually defined substitution groups, cannibalization matrices) rather than learned from data. If the relationships change, you have to update the rules manually.
- Best for: Large enterprises with existing ERP investments that need integrated S&OP across the organization.
- Watch out for: 6-18 month implementations with dedicated consultants. Cross-product logic is rules-based, not learned from data, so it breaks when relationships change.
6. Graph-based ML / KumoRFM
This is where the accuracy jump happens. Instead of treating each product as independent, graph-based ML reads the full network of relationships in your data: products connected to categories, categories to suppliers, suppliers to stores, stores to regions, products to promotions, promotions to time periods.
KumoRFM takes this further. It is a relational foundation model that reads raw relational tables directly from your data warehouse. You do not build a graph. You do not engineer features. You do not train a model. You connect your tables and write a PQL query. The model discovers which relationships matter for each product's forecast automatically.
- Best for: Any company with 1,000+ SKUs and meaningful cross-product effects (substitution, promotions, supplier constraints). Highest accuracy, lowest setup time.
- Watch out for: Requires relational data in a data warehouse. The more tables you connect (products, stores, suppliers, promotions), the better the results.
The accuracy gain comes from three sources that other approaches miss:
- Substitution patterns learned from data. When SKU A went out of stock last quarter, demand shifted to SKUs B and C. KumoRFM learns these substitution patterns from the product-category-sales graph and applies them to future forecasts.
- Promotional lift propagation. A promotion on one product affects demand for related products 2-3 hops away in the category-store graph. KumoRFM reads these multi-hop effects directly.
- Supplier and inventory signals. A supplier delay that constrains availability for 30 SKUs simultaneously creates demand shifts across the entire affected category. KumoRFM sees the supplier-product-store connections and adjusts forecasts across all affected products at once.
The benchmark evidence
The SAP SALT benchmark tests prediction accuracy on real enterprise relational data. Here is how the approaches compare:
sap_salt_benchmark_demand_relevant
| approach | accuracy | what it means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points. The gap comes from relational patterns that a flat feature table structurally cannot contain.
On the RelBench benchmark across 7 databases and 30 prediction tasks:
relbench_benchmark_results
| approach | AUROC | feature engineering time |
|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours per task |
| KumoRFM zero-shot | 76.71 | ~1 second |
| KumoRFM fine-tuned | 81.14 | Minutes |
KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.
5 steps to upgrade from spreadsheets to ML forecasting
If you are on spreadsheets today and want to get to ML-powered demand forecasting, here is the practical path. You do not need to boil the ocean.
- Audit your data. Before touching any ML tool, figure out what data you actually have. At minimum you need: historical sales by product and date. Ideally you also have product master data (categories, brands, attributes), store or location data, supplier information, promotion calendars, and pricing history. Most companies have this scattered across an ERP, a data warehouse, and a few spreadsheets. Get it into one place. A cloud data warehouse (Snowflake, BigQuery, Redshift) is the standard move.
- Establish a baseline. Before adding ML, measure your current forecast accuracy rigorously. Pick a metric (WMAPE is the most common for demand forecasting), measure it across your full catalog, and break it down by category, store, and product lifecycle stage. This is your baseline. Every ML approach you try gets measured against it. Without this, you are guessing whether ML is actually helping.
- Start with a gradient boosting pilot. Pick your top 100-500 SKUs (by revenue or volume). Build a simple XGBoost or LightGBM model with basic features: lagged sales, day of week, price, promotion flag, and category. Compare its accuracy to your spreadsheet baseline. If this does not beat your spreadsheet by at least 5%, you likely have a data quality problem, not a modeling problem. Fix the data first.
- Layer in relational ML. Once your baseline ML model is working, connect your relational data: product-category hierarchies, supplier tables, store attributes, promotion details. This is where KumoRFM shines. Write a PQL query, point it at your connected tables, and compare accuracy against your XGBoost baseline. The typical accuracy gain is 10-20 percentage points because the model now sees cross-product substitution, promotional propagation, and supplier effects.
- Integrate into your planning workflow. ML forecasts are useless if planners do not trust them. Start by running ML forecasts alongside your current process for 4-8 weeks. Let planners compare and build confidence. Then gradually shift to ML as the primary forecast with human override for edge cases. The goal is not to replace planner judgment. It is to give them a better starting point.
Spreadsheet-based demand planning
- Each product forecasted independently in Excel (50-60% accuracy)
- Seasonal adjustments applied manually, often inconsistently
- No visibility into cross-product substitution or promotional cannibalization
- Planner overrides based on gut feel, not data signals
- Supplier disruptions trigger reactive scrambles, not proactive adjustments
- New product forecasts based on 'it looks like Product X' guesswork
KumoRFM demand forecasting
- All products forecasted using full relational graph (85-91% accuracy)
- Seasonal patterns, cross-product effects, and supplier signals captured automatically
- Substitution, cannibalization, and promotional lift read from data
- Planners get a high-accuracy baseline to refine, not a rough guess to fix
- Supplier constraints propagated across affected products proactively
- New products forecasted using category, supplier, and store relationships
PQL Query
PREDICT next_4_weeks_demand FOR EACH products.product_id WHERE products.category = 'beverages'
One PQL query replaces the full demand forecasting pipeline: feature engineering, model training, cross-product signal extraction, and scoring. KumoRFM reads raw product, sales, store, supplier, and promotion tables directly and discovers both temporal and relational demand patterns.
Output
| product_id | predicted_demand | current_forecast | why_kumo_differs |
|---|---|---|---|
| SKU-4421 | 2,840 | 2,200 | Competitor substitute out of stock in 3 regional stores (substitution lift) |
| SKU-4422 | 1,150 | 1,600 | Promotional cannibalization from SKU-4425 discount next week |
| SKU-4423 | 3,200 | 3,100 | Supplier on-time, seasonal trend matches history (small adjustment) |
| SKU-4424 | 890 | 1,400 | Supplier delay affects availability in 12 stores (constrained demand) |
What makes graph-based forecasting different
The gap between approaches 1-5 and approach 6 is not about better time series modeling. It is about reading a different kind of signal entirely - the relationships between products, stores, and suppliers.
Think of it this way. Traditional forecasting asks: "What did this product do in the past, and what will it do next?" That question has a ceiling, because past behavior of a single product does not contain the information you need about substitution, cannibalization, and supply constraints.
Graph-based forecasting asks: "What is happening across the entire network of products, stores, suppliers, and promotions that this product belongs to, and how does that affect what it will do next?" That is a strictly richer question, and it produces strictly more accurate answers.
The difference is most visible in three scenarios:
- Stockout-driven substitution. When Product A goes out of stock at Store #112, demand for Products B and C (same category, similar price point) increases at that store and nearby stores. A graph-based model sees the product-category-store connections and predicts this shift. A time series model for Products B and C sees nothing unusual until the substitution demand actually shows up in the data, days later.
- Promotional ripple effects. A buy-one-get-one promotion on a leading brand lifts foot traffic to the entire aisle. Complementary products (chips with salsa, pasta with sauce) see 10-20% demand increase. Competing products see 5-15% cannibalization. These effects propagate 2-3 hops through the product-category graph. Individual product models cannot see them. A graph-based model reads them directly.
- Supplier disruption cascades. A delayed shipment from a single supplier affects 30 SKUs across 4 categories. Some stores can absorb the shortage from inventory. Others cannot, and their customers shift to substitute products. The demand impact cascades through the supplier-product-store-customer graph. Only a model that reads this graph can forecast the full cascade before it plays out.
Handling cold starts: new products with no history
New product launches are the Achilles heel of time series forecasting. No history means no forecast. The standard workaround is analogy-based planning: a human picks a "similar" product and uses its demand curve as a proxy. This is slow, subjective, and often wrong.
Graph-based ML handles cold starts differently. A new product has no sales history, but it is not isolated. It belongs to a category. It comes from a supplier. It has a price point, a pack size, a brand. It will be sold in specific stores with known traffic patterns. All of these are relationships in the graph.
KumoRFM reads these connections and transfers demand signals from the product's graph neighborhood. If the supplier's other products in the same category see 25% promotional lift on average, that signal applies to the new product too. If the target stores have strong beverage category velocity, the model knows that. No manual analogy mapping required.
When to use each approach
approach_recommendation_by_situation
| situation | recommended approach | why |
|---|---|---|
| Small catalog (<100 SKUs), stable demand | Spreadsheets or Prophet | Low complexity does not justify ML infrastructure investment |
| Large catalog, strong seasonal patterns, limited cross-product effects | Time series FM (Chronos, TimesFM) | Good temporal modeling without per-product tuning |
| Data science team available, some cross-product features needed | XGBoost/LightGBM with manual features | Flexible, interpretable, team can iterate on features |
| Existing SAP/Oracle ERP, need integrated S&OP | Enterprise platform (SAP IBP, Blue Yonder) | Integration with existing ERP ecosystem may outweigh accuracy gap |
| Large catalog, significant cross-product effects (substitution, promotions, supplier constraints) | KumoRFM | Only approach that reads full relational structure automatically. Highest accuracy, lowest setup time. |
| Frequent new product launches, cold start problem | KumoRFM | Graph neighborhood transfers demand signals to products with no history |
| Need highest possible accuracy, willing to combine approaches | Time series FM + KumoRFM | Time series FM for temporal patterns, KumoRFM for relational patterns. Complementary. |
Match the approach to your situation. For most companies with 1,000+ SKUs and meaningful cross-product effects, graph-based ML delivers the largest accuracy gain per unit of effort.