Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

How to Improve Demand Forecast Accuracy with ML: 6 Approaches Ranked

Most demand forecasting plateaus at 70% accuracy because it treats each product as independent. In reality, products have substitution effects, promotional lift propagation, and supplier relationships that span 2-3 hops. Here are 6 approaches ranked, from spreadsheets to graph-based ML, with benchmark data and a practical upgrade path.

TL;DR

  • 1Demand forecasting accuracy plateaus at 60-70% when you treat each product as independent. The missing signal is in the relationships: substitution effects when a product goes out of stock, promotional lift that propagates across categories, and supplier constraints that affect availability across your catalog.
  • 2Six approaches ranked from lowest to highest accuracy: spreadsheets (50-60%), statistical methods like ARIMA/Prophet (60-65%), XGBoost/LightGBM (65-75%), time series foundation models like Chronos/TimesFM (70-80%), enterprise platforms like SAP IBP (70-80%), and graph-based ML like KumoRFM (85-91%).
  • 3Time series foundation models predict temporal patterns. KumoRFM predicts using the full relational structure: product-category-supplier-store-promotion graphs. These are complementary, not competing.
  • 4On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features.
  • 5You can go from spreadsheets to production ML forecasts in under 3 weeks with KumoRFM. No feature engineering, no graph construction, no model training pipeline. Connect your data warehouse and write a PQL query.

You have probably heard some version of this pitch before: "Just add ML to your demand forecasting and accuracy goes up." It is technically true. But it skips the part that actually matters: why your forecasts are wrong in the first place.

The answer, for most companies, is not that their time series model is bad. It is that their model treats each product as if it exists in a vacuum. Product A's forecast does not know that Product B (its closest substitute) just went out of stock. It does not know that Supplier C is running two weeks late, which will affect 47 other SKUs in the same category. It does not know that a promotion on Product D will cannibalize 15% of Product A's demand.

These cross-product, cross-supplier, cross-store signals are where the accuracy gains live. And most ML approaches, including some very sophisticated ones, still miss them.

Why demand forecasting plateaus at 70%

The root cause is structural: most forecasting methods model each product's demand independently. You build a time series for SKU #4421, fit a model to its history, and project forward. Then you do the same for SKU #4422, and #4423, and all 50,000 SKUs in your catalog.

Each model sees its own history. None of them see the network of relationships that actually drives demand:

  • Substitution effects. When SKU #4421 goes out of stock, demand does not disappear. It shifts to related products in the same category. A model forecasting SKU #4422 in isolation will not see this incoming demand spike until it shows up in the historical data, by which time you have already missed the replenishment window.
  • Promotional lift propagation. A 20% discount on a hero SKU does not just affect that SKU. It pulls foot traffic to the category, lifts sales of complementary products, and cannibalizes competing brands on the same shelf. These ripple effects span 2-3 hops in the product-category-store graph.
  • Supplier constraints. A delayed shipment from a single supplier can affect availability of dozens of SKUs across multiple categories and stores. The demand signal (or rather, the constraint on fulfillable demand) propagates through the supplier-product-store network.
  • Store-level signals. Demand at Store #112 is not independent of demand at Store #113 across the street. They share a customer base, respond to the same local events, and compete for the same wallet. Regional demand shifts are visible in the store graph but invisible to per-product models.

Every one of these signals is relational. They live in the connections between products, categories, suppliers, stores, and promotions. A forecasting method that cannot read those connections will plateau, no matter how sophisticated its time series modeling gets.

6 approaches to demand forecasting, ranked

Here is how the major approaches compare across accuracy, effort, and what they can actually see in your data.

demand_forecasting_approaches_comparison

approachtypical accuracysetup efforthandles cross-product signalsbest for
1. Spreadsheets / Excel50-60%LowNoSmall catalogs, early-stage companies, quick sanity checks
2. Statistical methods (ARIMA, Prophet)60-65%Low-MediumNoStable demand patterns, single-product forecasting, baselines
3. XGBoost / LightGBM on flat features65-75%Medium-HighPartially (with manual feature engineering)Teams with data science capacity, tabular demand signals
4. Time series foundation models (Chronos, TimesFM)70-80%MediumNoTemporal pattern recognition, many-SKU forecasting without per-model tuning
5. Enterprise planning platforms (SAP IBP, Anaplan, Blue Yonder)70-80%Very High (6-18 months)Partially (rules-based, pre-configured)Large enterprises with existing ERP investments, integrated S&OP
6. Graph-based ML / KumoRFM85-91%Low (1-3 weeks)Yes - reads full relational structure automaticallyAny company with relational data: product-supplier-store-promotion graphs

Six approaches ranked by typical accuracy. The accuracy gap between approaches 1-5 and approach 6 comes from cross-product relational signals that only graph-based methods can read.

1. Spreadsheets and Excel

This is where most companies start, and honestly, it works fine for simple cases. Moving averages, seasonal indices, maybe a VLOOKUP to pull in last year's numbers. For a catalog of 50 products with stable demand, a well-maintained spreadsheet can hold its own.

It breaks down at scale. Once you have thousands of SKUs, dozens of stores, frequent promotions, and supplier variability, the spreadsheet becomes a maintenance nightmare. Formulas get copy-pasted wrong. Seasonal adjustments are applied inconsistently. Nobody trusts the numbers, so planners override them with gut feel anyway.

  • Best for: Small catalogs under 100 SKUs with stable demand, early-stage companies, and quick sanity checks.
  • Watch out for: Breaks down at scale. No cross-product signals. Planners override the numbers anyway, creating a false sense of process.

2. Statistical methods: ARIMA, Prophet

The classic upgrade from spreadsheets. ARIMA decomposes demand into trend, seasonality, and noise. Facebook's Prophet makes this easier to configure and handles holidays well. Both are solid for stable, repeating patterns.

The limitation: they see only one product's history at a time. They cannot incorporate external signals (promotions, weather, competitor actions) without significant custom work. And they assume the future looks like the past, which falls apart during disruptions, new product launches, or structural demand shifts.

  • Best for: Stable, repeating demand patterns and single-product forecasting baselines.
  • Watch out for: Sees only one product at a time. Cannot incorporate promotions, weather, or competitor actions without significant custom work.

3. XGBoost / LightGBM on flat features

This is the current workhorse for many data science teams. Engineer a flat feature table with columns for historical sales, price, promotions, day of week, weather, and any other signal you can flatten into a row. Train XGBoost or LightGBM to predict next-period demand.

It is genuinely better than statistical methods because it can incorporate non-temporal signals: price elasticity, promotional response, and interaction effects. The catch is feature engineering. You have to manually create every feature, and the most important signals (cross-product substitution, supplier effects) require joining and aggregating across multiple tables. Teams typically spend 12+ hours per prediction task on feature engineering alone. Even then, you are limited to the relationships you thought to encode. The signals you did not think of stay hidden.

  • Best for: Teams with data science capacity that want to incorporate price elasticity, promotional response, and interaction effects.
  • Watch out for: Feature engineering takes 12+ hours per prediction task. Cross-product substitution and supplier effects require complex multi-table joins that most teams give up on.

4. Time series foundation models: Chronos, TimesFM

A newer approach that is genuinely exciting. Chronos (from Amazon) and TimesFM (from Google) are pre-trained on massive collections of time series data. They recognize temporal patterns (trend shifts, seasonal shapes, level changes) without per-product model tuning.

For pure temporal forecasting, they are state of the art. If your accuracy gap is driven by complex seasonality or trend changes, these models will help. But they still operate on individual time series. They do not read the relational structure connecting products, stores, and suppliers. Substitution effects, promotional propagation, and supplier constraints remain invisible.

  • Best for: Many-SKU forecasting with complex seasonal patterns, without per-model tuning. State of the art for pure temporal forecasting.
  • Watch out for: Still operates on individual time series. Blind to substitution effects, promotional propagation, and supplier constraints across products.

5. Enterprise planning platforms: SAP IBP, Anaplan, Blue Yonder

These platforms bundle demand forecasting with broader supply chain planning: inventory optimization, S&OP, and production scheduling. They have built-in ML modules and can incorporate some cross-product signals through pre-configured rules and hierarchical forecasting.

The accuracy is decent (70-80% range), but the setup cost is brutal. Implementations run 6-18 months and require dedicated consultants. The ML components are often black boxes that you cannot inspect or customize. And the cross-product logic is typically rules-based (manually defined substitution groups, cannibalization matrices) rather than learned from data. If the relationships change, you have to update the rules manually.

  • Best for: Large enterprises with existing ERP investments that need integrated S&OP across the organization.
  • Watch out for: 6-18 month implementations with dedicated consultants. Cross-product logic is rules-based, not learned from data, so it breaks when relationships change.

6. Graph-based ML / KumoRFM

This is where the accuracy jump happens. Instead of treating each product as independent, graph-based ML reads the full network of relationships in your data: products connected to categories, categories to suppliers, suppliers to stores, stores to regions, products to promotions, promotions to time periods.

KumoRFM takes this further. It is a relational foundation model that reads raw relational tables directly from your data warehouse. You do not build a graph. You do not engineer features. You do not train a model. You connect your tables and write a PQL query. The model discovers which relationships matter for each product's forecast automatically.

  • Best for: Any company with 1,000+ SKUs and meaningful cross-product effects (substitution, promotions, supplier constraints). Highest accuracy, lowest setup time.
  • Watch out for: Requires relational data in a data warehouse. The more tables you connect (products, stores, suppliers, promotions), the better the results.

The accuracy gain comes from three sources that other approaches miss:

  1. Substitution patterns learned from data. When SKU A went out of stock last quarter, demand shifted to SKUs B and C. KumoRFM learns these substitution patterns from the product-category-sales graph and applies them to future forecasts.
  2. Promotional lift propagation. A promotion on one product affects demand for related products 2-3 hops away in the category-store graph. KumoRFM reads these multi-hop effects directly.
  3. Supplier and inventory signals. A supplier delay that constrains availability for 30 SKUs simultaneously creates demand shifts across the entire affected category. KumoRFM sees the supplier-product-store connections and adjusts forecasts across all affected products at once.

The benchmark evidence

The SAP SALT benchmark tests prediction accuracy on real enterprise relational data. Here is how the approaches compare:

sap_salt_benchmark_demand_relevant

approachaccuracywhat it means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points. The gap comes from relational patterns that a flat feature table structurally cannot contain.

On the RelBench benchmark across 7 databases and 30 prediction tasks:

relbench_benchmark_results

approachAUROCfeature engineering time
LightGBM + manual features62.4412.3 hours per task
KumoRFM zero-shot76.71~1 second
KumoRFM fine-tuned81.14Minutes

KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.

5 steps to upgrade from spreadsheets to ML forecasting

If you are on spreadsheets today and want to get to ML-powered demand forecasting, here is the practical path. You do not need to boil the ocean.

  1. Audit your data. Before touching any ML tool, figure out what data you actually have. At minimum you need: historical sales by product and date. Ideally you also have product master data (categories, brands, attributes), store or location data, supplier information, promotion calendars, and pricing history. Most companies have this scattered across an ERP, a data warehouse, and a few spreadsheets. Get it into one place. A cloud data warehouse (Snowflake, BigQuery, Redshift) is the standard move.
  2. Establish a baseline. Before adding ML, measure your current forecast accuracy rigorously. Pick a metric (WMAPE is the most common for demand forecasting), measure it across your full catalog, and break it down by category, store, and product lifecycle stage. This is your baseline. Every ML approach you try gets measured against it. Without this, you are guessing whether ML is actually helping.
  3. Start with a gradient boosting pilot. Pick your top 100-500 SKUs (by revenue or volume). Build a simple XGBoost or LightGBM model with basic features: lagged sales, day of week, price, promotion flag, and category. Compare its accuracy to your spreadsheet baseline. If this does not beat your spreadsheet by at least 5%, you likely have a data quality problem, not a modeling problem. Fix the data first.
  4. Layer in relational ML. Once your baseline ML model is working, connect your relational data: product-category hierarchies, supplier tables, store attributes, promotion details. This is where KumoRFM shines. Write a PQL query, point it at your connected tables, and compare accuracy against your XGBoost baseline. The typical accuracy gain is 10-20 percentage points because the model now sees cross-product substitution, promotional propagation, and supplier effects.
  5. Integrate into your planning workflow. ML forecasts are useless if planners do not trust them. Start by running ML forecasts alongside your current process for 4-8 weeks. Let planners compare and build confidence. Then gradually shift to ML as the primary forecast with human override for edge cases. The goal is not to replace planner judgment. It is to give them a better starting point.

Spreadsheet-based demand planning

  • Each product forecasted independently in Excel (50-60% accuracy)
  • Seasonal adjustments applied manually, often inconsistently
  • No visibility into cross-product substitution or promotional cannibalization
  • Planner overrides based on gut feel, not data signals
  • Supplier disruptions trigger reactive scrambles, not proactive adjustments
  • New product forecasts based on 'it looks like Product X' guesswork

KumoRFM demand forecasting

  • All products forecasted using full relational graph (85-91% accuracy)
  • Seasonal patterns, cross-product effects, and supplier signals captured automatically
  • Substitution, cannibalization, and promotional lift read from data
  • Planners get a high-accuracy baseline to refine, not a rough guess to fix
  • Supplier constraints propagated across affected products proactively
  • New products forecasted using category, supplier, and store relationships

PQL Query

PREDICT next_4_weeks_demand
FOR EACH products.product_id
WHERE products.category = 'beverages'

One PQL query replaces the full demand forecasting pipeline: feature engineering, model training, cross-product signal extraction, and scoring. KumoRFM reads raw product, sales, store, supplier, and promotion tables directly and discovers both temporal and relational demand patterns.

Output

product_idpredicted_demandcurrent_forecastwhy_kumo_differs
SKU-44212,8402,200Competitor substitute out of stock in 3 regional stores (substitution lift)
SKU-44221,1501,600Promotional cannibalization from SKU-4425 discount next week
SKU-44233,2003,100Supplier on-time, seasonal trend matches history (small adjustment)
SKU-44248901,400Supplier delay affects availability in 12 stores (constrained demand)

What makes graph-based forecasting different

The gap between approaches 1-5 and approach 6 is not about better time series modeling. It is about reading a different kind of signal entirely - the relationships between products, stores, and suppliers.

Think of it this way. Traditional forecasting asks: "What did this product do in the past, and what will it do next?" That question has a ceiling, because past behavior of a single product does not contain the information you need about substitution, cannibalization, and supply constraints.

Graph-based forecasting asks: "What is happening across the entire network of products, stores, suppliers, and promotions that this product belongs to, and how does that affect what it will do next?" That is a strictly richer question, and it produces strictly more accurate answers.

The difference is most visible in three scenarios:

  • Stockout-driven substitution. When Product A goes out of stock at Store #112, demand for Products B and C (same category, similar price point) increases at that store and nearby stores. A graph-based model sees the product-category-store connections and predicts this shift. A time series model for Products B and C sees nothing unusual until the substitution demand actually shows up in the data, days later.
  • Promotional ripple effects. A buy-one-get-one promotion on a leading brand lifts foot traffic to the entire aisle. Complementary products (chips with salsa, pasta with sauce) see 10-20% demand increase. Competing products see 5-15% cannibalization. These effects propagate 2-3 hops through the product-category graph. Individual product models cannot see them. A graph-based model reads them directly.
  • Supplier disruption cascades. A delayed shipment from a single supplier affects 30 SKUs across 4 categories. Some stores can absorb the shortage from inventory. Others cannot, and their customers shift to substitute products. The demand impact cascades through the supplier-product-store-customer graph. Only a model that reads this graph can forecast the full cascade before it plays out.

Handling cold starts: new products with no history

New product launches are the Achilles heel of time series forecasting. No history means no forecast. The standard workaround is analogy-based planning: a human picks a "similar" product and uses its demand curve as a proxy. This is slow, subjective, and often wrong.

Graph-based ML handles cold starts differently. A new product has no sales history, but it is not isolated. It belongs to a category. It comes from a supplier. It has a price point, a pack size, a brand. It will be sold in specific stores with known traffic patterns. All of these are relationships in the graph.

KumoRFM reads these connections and transfers demand signals from the product's graph neighborhood. If the supplier's other products in the same category see 25% promotional lift on average, that signal applies to the new product too. If the target stores have strong beverage category velocity, the model knows that. No manual analogy mapping required.

When to use each approach

approach_recommendation_by_situation

situationrecommended approachwhy
Small catalog (<100 SKUs), stable demandSpreadsheets or ProphetLow complexity does not justify ML infrastructure investment
Large catalog, strong seasonal patterns, limited cross-product effectsTime series FM (Chronos, TimesFM)Good temporal modeling without per-product tuning
Data science team available, some cross-product features neededXGBoost/LightGBM with manual featuresFlexible, interpretable, team can iterate on features
Existing SAP/Oracle ERP, need integrated S&OPEnterprise platform (SAP IBP, Blue Yonder)Integration with existing ERP ecosystem may outweigh accuracy gap
Large catalog, significant cross-product effects (substitution, promotions, supplier constraints)KumoRFMOnly approach that reads full relational structure automatically. Highest accuracy, lowest setup time.
Frequent new product launches, cold start problemKumoRFMGraph neighborhood transfers demand signals to products with no history
Need highest possible accuracy, willing to combine approachesTime series FM + KumoRFMTime series FM for temporal patterns, KumoRFM for relational patterns. Complementary.

Match the approach to your situation. For most companies with 1,000+ SKUs and meaningful cross-product effects, graph-based ML delivers the largest accuracy gain per unit of effort.

Frequently asked questions

How do I improve demand forecast accuracy with ML?

Start by identifying why your current forecasts are wrong. If you are using spreadsheets or basic statistical methods, the biggest accuracy gain comes from incorporating cross-product relationships that these tools ignore. Products have substitution effects (when one SKU goes out of stock, demand shifts to related products), promotional lift that propagates across categories, and supplier constraints that affect availability across your catalog. Traditional forecasting treats each product as independent, which caps accuracy around 60-70%. Moving to ML that captures these relationships - particularly graph-based approaches like KumoRFM - can push accuracy above 90%. On the SAP SALT benchmark, KumoRFM achieves 91% accuracy vs 75% for expert-tuned XGBoost and 63% for LLM+AutoML. The practical path: start with time series ML for temporal patterns, then add relational ML to capture the cross-product, supplier, and store-level signals that drive the biggest accuracy gains.

We are using spreadsheets for demand planning. What ML tools should we try?

If you are on spreadsheets today, skip directly to step 3 or 4 in the upgrade path. Statistical methods like ARIMA and Prophet are marginal improvements over Excel for most retail and CPG use cases. Instead, try a gradient boosting library (XGBoost or LightGBM) with basic features: historical sales, day of week, promotions, price, and weather. This alone typically adds 10-15% accuracy over spreadsheets. Once that is working, look at time series foundation models like Chronos or TimesFM for better temporal pattern recognition. The largest accuracy jump comes from graph-based ML tools like KumoRFM, which read the full relational structure of your data: product-category-supplier-store-promotion connections. KumoRFM requires no feature engineering. You connect your data warehouse tables and write a PQL query like PREDICT next_4_weeks_demand FOR EACH products.product_id. Most teams go from spreadsheets to production ML forecasts in under two weeks with this approach.

What is the difference between time series forecasting and graph-based demand forecasting?

Time series forecasting (ARIMA, Prophet, Chronos, TimesFM) analyzes the historical demand pattern for each product independently: trend, seasonality, holiday effects, and recent momentum. It answers 'what will this product do next, based on what it did before?' Graph-based demand forecasting (KumoRFM) reads the full relational structure connecting products, categories, suppliers, stores, promotions, and customers. It answers 'what will this product do next, given everything happening across the network it belongs to?' The difference matters most when cross-product effects are large: substitution during stockouts, promotional lift that propagates across categories, supplier disruptions that affect multiple SKUs, and regional demand shifts across store clusters. Time series models are blind to these signals. Graph-based models read them directly. The two approaches are complementary: time series captures temporal patterns, graph-based ML captures relational patterns. KumoRFM handles both in a single model.

How accurate is ML demand forecasting compared to spreadsheets?

Spreadsheet-based demand planning (moving averages, seasonal indices in Excel) typically achieves 50-60% forecast accuracy measured by weighted MAPE or similar metrics. Basic statistical methods like ARIMA and Prophet push this to 60-65%. Gradient boosting (XGBoost, LightGBM) with engineered features reaches 65-75%. Time series foundation models (Chronos, TimesFM) achieve 70-80% on temporal patterns. Enterprise planning platforms (SAP IBP, Blue Yonder) with built-in ML reach 70-80% depending on configuration. Graph-based ML (KumoRFM) achieves 85-91% by capturing cross-product, supplier, and store-level relationships that other approaches miss. On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists using XGBoost with weeks of manual feature engineering.

Can ML demand forecasting handle new product launches with no history?

This is one of the hardest problems in demand forecasting, and it is where graph-based ML has the largest advantage over time series methods. Time series models require historical data by definition. They cannot forecast a product with no sales history. The standard workaround is analogy-based forecasting: manually pick a similar product and use its history as a proxy. Graph-based ML like KumoRFM handles cold starts differently. A new product has no sales history, but it does have relationships: it belongs to a category, comes from a supplier, is priced in a range, targets a customer segment, and will be sold in specific stores. KumoRFM reads these connections and transfers demand signals from related products automatically. If a new energy drink launches from a supplier whose other products see 30% promotional lift, KumoRFM incorporates that signal without any manual analogy mapping.

How long does it take to move from spreadsheets to ML demand forecasting?

It depends on the approach. Building a custom XGBoost pipeline with feature engineering typically takes 2-4 months: data preparation, feature creation, model training, validation, and integration with your planning workflow. A custom GNN or deep learning pipeline takes 4-8 months due to graph construction and infrastructure requirements. Enterprise planning platforms (SAP IBP, Blue Yonder) take 6-18 months for full implementation. KumoRFM takes 1-3 weeks. You connect your existing data warehouse tables (products, sales, stores, suppliers, promotions), write a PQL query, and get forecasts. There is no feature engineering, no graph construction, and no model training step. Most teams run a proof-of-concept against their existing forecast in under a week and move to production within two to three weeks.

What data do I need for ML demand forecasting?

At minimum, you need historical sales data: product, date, quantity sold, and ideally price. That is enough for basic time series models. For gradient boosting, add promotion calendars, weather data, holiday flags, and product attributes (category, brand, pack size). For graph-based ML like KumoRFM, the more relational data you connect, the better the forecast. The highest-impact tables are: product master (categories, brands, attributes), store/location data, supplier information, promotion calendars, pricing history, inventory levels, and customer transaction data if available. KumoRFM reads these tables directly from your data warehouse. You do not need to pre-join them or engineer features. The model discovers which relationships matter for each product's forecast automatically.

Does graph-based demand forecasting work for B2B and manufacturing, or only retail?

Graph-based demand forecasting works across industries, but the graph structure differs. In retail, the graph connects products, categories, stores, suppliers, promotions, and customers. In B2B manufacturing, the graph connects finished goods, bill-of-materials components, customers, contracts, lead times, and production capacity. In both cases, the core insight is the same: demand for any single item depends on its relationships to other items, customers, and constraints in the network. A delayed shipment from Supplier A affects demand for Product B, which affects orders from Customer C. KumoRFM reads these multi-hop dependencies directly from your relational data, regardless of whether your business is B2C retail, B2B distribution, or discrete manufacturing.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.