How do I improve demand forecast accuracy with ML?

Start by identifying why your current forecasts are wrong. If you are using spreadsheets or basic statistical methods, the biggest accuracy gain comes from incorporating cross-product relationships that these tools ignore. Products have substitution effects (when one SKU goes out of stock, demand shifts to related products), promotional lift that propagates across categories, and supplier constraints that affect availability across your catalog. Traditional forecasting treats each product as independent, which caps accuracy around 60-70%. Moving to ML that captures these relationships - particularly graph-based approaches like KumoRFM - can push accuracy above 90%. On the SAP SALT benchmark, KumoRFM achieves 91% accuracy vs 75% for expert-tuned XGBoost and 63% for LLM+AutoML. The practical path: start with time series ML for temporal patterns, then add relational ML to capture the cross-product, supplier, and store-level signals that drive the biggest accuracy gains.

We are using spreadsheets for demand planning. What ML tools should we try?

If you are on spreadsheets today, skip directly to step 3 or 4 in the upgrade path. Statistical methods like ARIMA and Prophet are marginal improvements over Excel for most retail and CPG use cases. Instead, try a gradient boosting library (XGBoost or LightGBM) with basic features: historical sales, day of week, promotions, price, and weather. This alone typically adds 10-15% accuracy over spreadsheets. Once that is working, look at time series foundation models like Chronos or TimesFM for better temporal pattern recognition. The largest accuracy jump comes from graph-based ML tools like KumoRFM, which read the full relational structure of your data: product-category-supplier-store-promotion connections. KumoRFM requires no feature engineering. You connect your data warehouse tables and write a PQL query like PREDICT next_4_weeks_demand FOR EACH products.product_id. Most teams go from spreadsheets to production ML forecasts in under two weeks with this approach.

What is the difference between time series forecasting and graph-based demand forecasting?

Time series forecasting (ARIMA, Prophet, Chronos, TimesFM) analyzes the historical demand pattern for each product independently: trend, seasonality, holiday effects, and recent momentum. It answers 'what will this product do next, based on what it did before?' Graph-based demand forecasting (KumoRFM) reads the full relational structure connecting products, categories, suppliers, stores, promotions, and customers. It answers 'what will this product do next, given everything happening across the network it belongs to?' The difference matters most when cross-product effects are large: substitution during stockouts, promotional lift that propagates across categories, supplier disruptions that affect multiple SKUs, and regional demand shifts across store clusters. Time series models are blind to these signals. Graph-based models read them directly. The two approaches are complementary: time series captures temporal patterns, graph-based ML captures relational patterns. KumoRFM handles both in a single model.

How accurate is ML demand forecasting compared to spreadsheets?

Spreadsheet-based demand planning (moving averages, seasonal indices in Excel) typically achieves 50-60% forecast accuracy measured by weighted MAPE or similar metrics. Basic statistical methods like ARIMA and Prophet push this to 60-65%. Gradient boosting (XGBoost, LightGBM) with engineered features reaches 65-75%. Time series foundation models (Chronos, TimesFM) achieve 70-80% on temporal patterns. Enterprise planning platforms (SAP IBP, Blue Yonder) with built-in ML reach 70-80% depending on configuration. Graph-based ML (KumoRFM) achieves 85-91% by capturing cross-product, supplier, and store-level relationships that other approaches miss. On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists using XGBoost with weeks of manual feature engineering.

How long does it take to move from spreadsheets to ML demand forecasting?

It depends on the approach. Building a custom XGBoost pipeline with feature engineering typically takes 2-4 months: data preparation, feature creation, model training, validation, and integration with your planning workflow. A custom GNN or deep learning pipeline takes 4-8 months due to graph construction and infrastructure requirements. Enterprise planning platforms (SAP IBP, Blue Yonder) take 6-18 months for full implementation. KumoRFM takes 1-3 weeks. You connect your existing data warehouse tables (products, sales, stores, suppliers, promotions), write a PQL query, and get forecasts. There is no feature engineering, no graph construction, and no model training step. Most teams run a proof-of-concept against their existing forecast in under a week and move to production within two to three weeks.

What data do I need for ML demand forecasting?

At minimum, you need historical sales data: product, date, quantity sold, and ideally price. That is enough for basic time series models. For gradient boosting, add promotion calendars, weather data, holiday flags, and product attributes (category, brand, pack size). For graph-based ML like KumoRFM, the more relational data you connect, the better the forecast. The highest-impact tables are: product master (categories, brands, attributes), store/location data, supplier information, promotion calendars, pricing history, inventory levels, and customer transaction data if available. KumoRFM reads these tables directly from your data warehouse. You do not need to pre-join them or engineer features. The model discovers which relationships matter for each product's forecast automatically.

Does graph-based demand forecasting work for B2B and manufacturing, or only retail?

Graph-based demand forecasting works across industries, but the graph structure differs. In retail, the graph connects products, categories, stores, suppliers, promotions, and customers. In B2B manufacturing, the graph connects finished goods, bill-of-materials components, customers, contracts, lead times, and production capacity. In both cases, the core insight is the same: demand for any single item depends on its relationships to other items, customers, and constraints in the network. A delayed shipment from Supplier A affects demand for Product B, which affects orders from Customer C. KumoRFM reads these multi-hop dependencies directly from your relational data, regardless of whether your business is B2C retail, B2B distribution, or discrete manufacturing.

How to Improve Demand Forecast Accuracy with ML: 6 Approaches Ranked | Kumo.ai

Q: Can ML demand forecasting handle new product launches with no history?

This is one of the hardest problems in demand forecasting, and it is where graph-based ML has the largest advantage over time series methods. Time series models require historical data by definition. They cannot forecast a product with no sales history. The standard workaround is analogy-based forecasting: manually pick a similar product and use its history as a proxy. Graph-based ML like KumoRFM handles cold starts differently. A new product has no sales history, but it does have relationships: it belongs to a category, comes from a supplier, is priced in a range, targets a customer segment, and will be sold in specific stores. KumoRFM reads these connections and transfers demand signals from related products automatically. If a new energy drink launches from a supplier whose other products see 30% promotional lift, KumoRFM incorporates that signal without any manual analogy mapping.

You have probably heard some version of this pitch before: "Just add ML to your demand forecasting and accuracy goes up." It is technically true. But it skips the part that actually matters: why your forecasts are wrong in the first place.

The answer, for most companies, is not that their time series model is bad. It is that their model treats each product as if it exists in a vacuum. Product A's forecast does not know that Product B (its closest substitute) just went out of stock. It does not know that Supplier C is running two weeks late, which will affect 47 other SKUs in the same category. It does not know that a promotion on Product D will cannibalize 15% of Product A's demand.

These cross-product, cross-supplier, cross-store signals are where the accuracy gains live. And most ML approaches, including some very sophisticated ones, still miss them.

Why demand forecasting plateaus at 70%

The root cause is structural: most forecasting methods model each product's demand independently. You build a time series for SKU #4421, fit a model to its history, and project forward. Then you do the same for SKU #4422, and #4423, and all 50,000 SKUs in your catalog.

Each model sees its own history. None of them see the network of relationships that actually drives demand:

Substitution effects. When SKU #4421 goes out of stock, demand does not disappear. It shifts to related products in the same category. A model forecasting SKU #4422 in isolation will not see this incoming demand spike until it shows up in the historical data, by which time you have already missed the replenishment window.
Promotional lift propagation. A 20% discount on a hero SKU does not just affect that SKU. It pulls foot traffic to the category, lifts sales of complementary products, and cannibalizes competing brands on the same shelf. These ripple effects span 2-3 hops in the product-category-store graph.
Supplier constraints. A delayed shipment from a single supplier can affect availability of dozens of SKUs across multiple categories and stores. The demand signal (or rather, the constraint on fulfillable demand) propagates through the supplier-product-store network.
Store-level signals. Demand at Store #112 is not independent of demand at Store #113 across the street. They share a customer base, respond to the same local events, and compete for the same wallet. Regional demand shifts are visible in the store graph but invisible to per-product models.

Every one of these signals is relational. They live in the connections between products, categories, suppliers, stores, and promotions. A forecasting method that cannot read those connections will plateau, no matter how sophisticated its time series modeling gets.

6 approaches to demand forecasting, ranked

Here is how the major approaches compare across accuracy, effort, and what they can actually see in your data.

demand_forecasting_approaches_comparison

approach	typical accuracy	setup effort	handles cross-product signals	best for
1. Spreadsheets / Excel	50-60%	Low	No	Small catalogs, early-stage companies, quick sanity checks
2. Statistical methods (ARIMA, Prophet)	60-65%	Low-Medium	No	Stable demand patterns, single-product forecasting, baselines
3. XGBoost / LightGBM on flat features	65-75%	Medium-High	Partially (with manual feature engineering)	Teams with data science capacity, tabular demand signals
4. Time series foundation models (Chronos, TimesFM)	70-80%	Medium	No	Temporal pattern recognition, many-SKU forecasting without per-model tuning
5. Enterprise planning platforms (SAP IBP, Anaplan, Blue Yonder)	70-80%	Very High (6-18 months)	Partially (rules-based, pre-configured)	Large enterprises with existing ERP investments, integrated S&OP
6. Graph-based ML / KumoRFM	85-91%	Low (1-3 weeks)	Yes - reads full relational structure automatically	Any company with relational data: product-supplier-store-promotion graphs

Six approaches ranked by typical accuracy. The accuracy gap between approaches 1-5 and approach 6 comes from cross-product relational signals that only graph-based methods can read.

1. Spreadsheets and Excel

This is where most companies start, and honestly, it works fine for simple cases. Moving averages, seasonal indices, maybe a VLOOKUP to pull in last year's numbers. For a catalog of 50 products with stable demand, a well-maintained spreadsheet can hold its own.

It breaks down at scale. Once you have thousands of SKUs, dozens of stores, frequent promotions, and supplier variability, the spreadsheet becomes a maintenance nightmare. Formulas get copy-pasted wrong. Seasonal adjustments are applied inconsistently. Nobody trusts the numbers, so planners override them with gut feel anyway.

Best for: Small catalogs under 100 SKUs with stable demand, early-stage companies, and quick sanity checks.
Watch out for: Breaks down at scale. No cross-product signals. Planners override the numbers anyway, creating a false sense of process.

2. Statistical methods: ARIMA, Prophet

The classic upgrade from spreadsheets. ARIMA decomposes demand into trend, seasonality, and noise. Facebook's Prophet makes this easier to configure and handles holidays well. Both are solid for stable, repeating patterns.

The limitation: they see only one product's history at a time. They cannot incorporate external signals (promotions, weather, competitor actions) without significant custom work. And they assume the future looks like the past, which falls apart during disruptions, new product launches, or structural demand shifts.

Best for: Stable, repeating demand patterns and single-product forecasting baselines.
Watch out for: Sees only one product at a time. Cannot incorporate promotions, weather, or competitor actions without significant custom work.

3. XGBoost / LightGBM on flat features

This is the current workhorse for many data science teams. Engineer a flat feature table with columns for historical sales, price, promotions, day of week, weather, and any other signal you can flatten into a row. Train XGBoost or LightGBM to predict next-period demand.

It is genuinely better than statistical methods because it can incorporate non-temporal signals: price elasticity, promotional response, and interaction effects. The catch is feature engineering. You have to manually create every feature, and the most important signals (cross-product substitution, supplier effects) require joining and aggregating across multiple tables. Teams typically spend 12+ hours per prediction task on feature engineering alone. Even then, you are limited to the relationships you thought to encode. The signals you did not think of stay hidden.

Best for: Teams with data science capacity that want to incorporate price elasticity, promotional response, and interaction effects.
Watch out for: Feature engineering takes 12+ hours per prediction task. Cross-product substitution and supplier effects require complex multi-table joins that most teams give up on.

4. Time series foundation models: Chronos, TimesFM

A newer approach that is genuinely exciting. Chronos (from Amazon) and TimesFM (from Google) are pre-trained on massive collections of time series data. They recognize temporal patterns (trend shifts, seasonal shapes, level changes) without per-product model tuning.

For pure temporal forecasting, they are state of the art. If your accuracy gap is driven by complex seasonality or trend changes, these models will help. But they still operate on individual time series. They do not read the relational structure connecting products, stores, and suppliers. Substitution effects, promotional propagation, and supplier constraints remain invisible.

Best for: Many-SKU forecasting with complex seasonal patterns, without per-model tuning. State of the art for pure temporal forecasting.
Watch out for: Still operates on individual time series. Blind to substitution effects, promotional propagation, and supplier constraints across products.

5. Enterprise planning platforms: SAP IBP, Anaplan, Blue Yonder

These platforms bundle demand forecasting with broader supply chain planning: inventory optimization, S&OP, and production scheduling. They have built-in ML modules and can incorporate some cross-product signals through pre-configured rules and hierarchical forecasting.

The accuracy is decent (70-80% range), but the setup cost is brutal. Implementations run 6-18 months and require dedicated consultants. The ML components are often black boxes that you cannot inspect or customize. And the cross-product logic is typically rules-based (manually defined substitution groups, cannibalization matrices) rather than learned from data. If the relationships change, you have to update the rules manually.

Best for: Large enterprises with existing ERP investments that need integrated S&OP across the organization.
Watch out for: 6-18 month implementations with dedicated consultants. Cross-product logic is rules-based, not learned from data, so it breaks when relationships change.

6. Graph-based ML / KumoRFM

This is where the accuracy jump happens. Instead of treating each product as independent, graph-based ML reads the full network of relationships in your data: products connected to categories, categories to suppliers, suppliers to stores, stores to regions, products to promotions, promotions to time periods.

KumoRFM takes this further. It is a relational foundation model that reads raw relational tables directly from your data warehouse. You do not build a graph. You do not engineer features. You do not train a model. You connect your tables and write a PQL query. The model discovers which relationships matter for each product's forecast automatically.

Best for: Any company with 1,000+ SKUs and meaningful cross-product effects (substitution, promotions, supplier constraints). Highest accuracy, lowest setup time.
Watch out for: Requires relational data in a data warehouse. The more tables you connect (products, stores, suppliers, promotions), the better the results.

The accuracy gain comes from three sources that other approaches miss:

Substitution patterns learned from data. When SKU A went out of stock last quarter, demand shifted to SKUs B and C. KumoRFM learns these substitution patterns from the product-category-sales graph and applies them to future forecasts.
Promotional lift propagation. A promotion on one product affects demand for related products 2-3 hops away in the category-store graph. KumoRFM reads these multi-hop effects directly.
Supplier and inventory signals. A supplier delay that constrains availability for 30 SKUs simultaneously creates demand shifts across the entire affected category. KumoRFM sees the supplier-product-store connections and adjusts forecasts across all affected products at once.

The benchmark evidence

The SAP SALT benchmark tests prediction accuracy on real enterprise relational data. Here is how the approaches compare:

sap_salt_benchmark_demand_relevant

approach	accuracy	what it means
LLM + AutoML	63%	Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost	75%	Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)	91%	No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points. The gap comes from relational patterns that a flat feature table structurally cannot contain.

On the RelBench benchmark across 7 databases and 30 prediction tasks:

relbench_benchmark_results

approach	AUROC	feature engineering time
LightGBM + manual features	62.44	12.3 hours per task
KumoRFM zero-shot	76.71	~1 second
KumoRFM fine-tuned	81.14	Minutes

KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.

5 steps to upgrade from spreadsheets to ML forecasting

If you are on spreadsheets today and want to get to ML-powered demand forecasting, here is the practical path. You do not need to boil the ocean.

Audit your data. Before touching any ML tool, figure out what data you actually have. At minimum you need: historical sales by product and date. Ideally you also have product master data (categories, brands, attributes), store or location data, supplier information, promotion calendars, and pricing history. Most companies have this scattered across an ERP, a data warehouse, and a few spreadsheets. Get it into one place. A cloud data warehouse (Snowflake, BigQuery, Redshift) is the standard move.
Establish a baseline. Before adding ML, measure your current forecast accuracy rigorously. Pick a metric (WMAPE is the most common for demand forecasting), measure it across your full catalog, and break it down by category, store, and product lifecycle stage. This is your baseline. Every ML approach you try gets measured against it. Without this, you are guessing whether ML is actually helping.
Start with a gradient boosting pilot. Pick your top 100-500 SKUs (by revenue or volume). Build a simple XGBoost or LightGBM model with basic features: lagged sales, day of week, price, promotion flag, and category. Compare its accuracy to your spreadsheet baseline. If this does not beat your spreadsheet by at least 5%, you likely have a data quality problem, not a modeling problem. Fix the data first.
Layer in relational ML. Once your baseline ML model is working, connect your relational data: product-category hierarchies, supplier tables, store attributes, promotion details. This is where KumoRFM shines. Write a PQL query, point it at your connected tables, and compare accuracy against your XGBoost baseline. The typical accuracy gain is 10-20 percentage points because the model now sees cross-product substitution, promotional propagation, and supplier effects.
Integrate into your planning workflow. ML forecasts are useless if planners do not trust them. Start by running ML forecasts alongside your current process for 4-8 weeks. Let planners compare and build confidence. Then gradually shift to ML as the primary forecast with human override for edge cases. The goal is not to replace planner judgment. It is to give them a better starting point.

Spreadsheet-based demand planning

Each product forecasted independently in Excel (50-60% accuracy)
Seasonal adjustments applied manually, often inconsistently
No visibility into cross-product substitution or promotional cannibalization
Planner overrides based on gut feel, not data signals
Supplier disruptions trigger reactive scrambles, not proactive adjustments
New product forecasts based on 'it looks like Product X' guesswork

KumoRFM demand forecasting

All products forecasted using full relational graph (85-91% accuracy)
Seasonal patterns, cross-product effects, and supplier signals captured automatically
Substitution, cannibalization, and promotional lift read from data
Planners get a high-accuracy baseline to refine, not a rough guess to fix
Supplier constraints propagated across affected products proactively
New products forecasted using category, supplier, and store relationships

PQL Query

PREDICT next_4_weeks_demand
FOR EACH products.product_id
WHERE products.category = 'beverages'

One PQL query replaces the full demand forecasting pipeline: feature engineering, model training, cross-product signal extraction, and scoring. KumoRFM reads raw product, sales, store, supplier, and promotion tables directly and discovers both temporal and relational demand patterns.

Output

product_id	predicted_demand	current_forecast	why_kumo_differs
SKU-4421	2,840	2,200	Competitor substitute out of stock in 3 regional stores (substitution lift)
SKU-4422	1,150	1,600	Promotional cannibalization from SKU-4425 discount next week
SKU-4423	3,200	3,100	Supplier on-time, seasonal trend matches history (small adjustment)
SKU-4424	890	1,400	Supplier delay affects availability in 12 stores (constrained demand)

What makes graph-based forecasting different

The gap between approaches 1-5 and approach 6 is not about better time series modeling. It is about reading a different kind of signal entirely - the relationships between products, stores, and suppliers.

Think of it this way. Traditional forecasting asks: "What did this product do in the past, and what will it do next?" That question has a ceiling, because past behavior of a single product does not contain the information you need about substitution, cannibalization, and supply constraints.

Graph-based forecasting asks: "What is happening across the entire network of products, stores, suppliers, and promotions that this product belongs to, and how does that affect what it will do next?" That is a strictly richer question, and it produces strictly more accurate answers.

The difference is most visible in three scenarios:

Stockout-driven substitution. When Product A goes out of stock at Store #112, demand for Products B and C (same category, similar price point) increases at that store and nearby stores. A graph-based model sees the product-category-store connections and predicts this shift. A time series model for Products B and C sees nothing unusual until the substitution demand actually shows up in the data, days later.
Promotional ripple effects. A buy-one-get-one promotion on a leading brand lifts foot traffic to the entire aisle. Complementary products (chips with salsa, pasta with sauce) see 10-20% demand increase. Competing products see 5-15% cannibalization. These effects propagate 2-3 hops through the product-category graph. Individual product models cannot see them. A graph-based model reads them directly.
Supplier disruption cascades. A delayed shipment from a single supplier affects 30 SKUs across 4 categories. Some stores can absorb the shortage from inventory. Others cannot, and their customers shift to substitute products. The demand impact cascades through the supplier-product-store-customer graph. Only a model that reads this graph can forecast the full cascade before it plays out.

Handling cold starts: new products with no history

New product launches are the Achilles heel of time series forecasting. No history means no forecast. The standard workaround is analogy-based planning: a human picks a "similar" product and uses its demand curve as a proxy. This is slow, subjective, and often wrong.

Graph-based ML handles cold starts differently. A new product has no sales history, but it is not isolated. It belongs to a category. It comes from a supplier. It has a price point, a pack size, a brand. It will be sold in specific stores with known traffic patterns. All of these are relationships in the graph.

KumoRFM reads these connections and transfers demand signals from the product's graph neighborhood. If the supplier's other products in the same category see 25% promotional lift on average, that signal applies to the new product too. If the target stores have strong beverage category velocity, the model knows that. No manual analogy mapping required.

When to use each approach

approach_recommendation_by_situation

situation	recommended approach	why
Small catalog (<100 SKUs), stable demand	Spreadsheets or Prophet	Low complexity does not justify ML infrastructure investment
Large catalog, strong seasonal patterns, limited cross-product effects	Time series FM (Chronos, TimesFM)	Good temporal modeling without per-product tuning
Data science team available, some cross-product features needed	XGBoost/LightGBM with manual features	Flexible, interpretable, team can iterate on features
Existing SAP/Oracle ERP, need integrated S&OP	Enterprise platform (SAP IBP, Blue Yonder)	Integration with existing ERP ecosystem may outweigh accuracy gap
Large catalog, significant cross-product effects (substitution, promotions, supplier constraints)	KumoRFM	Only approach that reads full relational structure automatically. Highest accuracy, lowest setup time.
Frequent new product launches, cold start problem	KumoRFM	Graph neighborhood transfers demand signals to products with no history
Need highest possible accuracy, willing to combine approaches	Time series FM + KumoRFM	Time series FM for temporal patterns, KumoRFM for relational patterns. Complementary.

Match the approach to your situation. For most companies with 1,000+ SKUs and meaningful cross-product effects, graph-based ML delivers the largest accuracy gain per unit of effort.

Key Takeaways

1Demand forecasting plateaus at 60-70% accuracy when each product is treated independently. The missing signal is in cross-product relationships: substitution effects, promotional lift propagation, and supplier constraints that span 2-3 hops in your data.
2Six approaches ranked: spreadsheets (50-60%), ARIMA/Prophet (60-65%), XGBoost (65-75%), time series FMs (70-80%), enterprise platforms (70-80%), graph-based ML/KumoRFM (85-91%). The accuracy gap between the first five and the sixth comes from relational signals that only graph-based methods can read.
3Time series foundation models (Chronos, TimesFM) and graph-based ML (KumoRFM) are complementary. Time series FMs capture temporal patterns. KumoRFM captures relational patterns across the product-supplier-store-promotion graph. The highest-accuracy systems use both.
4On the SAP SALT benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features.
5The practical upgrade path from spreadsheets: audit your data, establish a baseline, pilot with XGBoost on top SKUs, layer in KumoRFM for relational signals, then integrate into your planning workflow. Most teams go from spreadsheets to production ML forecasts in under 3 weeks with KumoRFM.

How to Improve Demand Forecast Accuracy with ML: 6 Approaches Ranked