Solution Background and Business Value

Demand forecasting is essential for industries such as sales, retail, manufacturing, and supply chain management. Accurate demand predictions help businesses optimize storage, transportation, and inventory planning, reducing waste and improving operational efficiency. For example, a meal kit delivery service benefits from demand forecasting by ensuring they order the correct amount of raw ingredients to match recipe demand, preventing shortages and minimizing waste.

Forecasting accuracy is typically measured using MAE (Mean Absolute Error) or SMAPE (Symmetric Mean Absolute Percentage Error). Depending on business needs, forecasts can be made for different time frames and aggregation levels, such as:

  • Forecasting the number of orders per recipe group six weeks ahead.

  • Predicting the total demand for specific ingredients over the next 28 days.

Data Requirements and Schema

Core Tables

  1. Sales/Orders Table: Captures historical sales transactions.

    • order_id (Primary Key)

    • item_id (Foreign Key referencing Items)

    • buyer_id (Foreign Key referencing Buyers)

    • group_id (Foreign Key referencing Groups)

    • event_id (Foreign Key referencing Events)

    • timestamp (Date and time of the sale)

    • quantity (Number of items sold)

  2. Items Table: Stores static information about each item.

    • item_id (Primary Key)
  3. Groups Table: Categorizes items for aggregated forecasting.

    • group_id (Primary Key)
  4. Buyers Table: Contains buyer information to enhance predictive accuracy.

    • buyer_id (Primary Key)
  5. Events Table: Records external factors that may affect demand, such as holidays or promotions.

    • event_id (Primary Key)

    • timestamp (Date and time of the event)

Entity Relationship Diagram (ERD)

Predictive Queries

Kumo allows flexible demand forecasting using Predictive Queries. Below are five different ways to forecast demand:

  1. Forecasting demand using order count:

    PREDICT COUNT(sales.*, 0, 30, days)
    FOR EACH items.item_id
    
  2. Forecasting demand using sum of item quantities:

    PREDICT SUM(sales.quantity, 0, 30, days)
    FOR EACH items.item_id
    
  3. Forecasting at an aggregated category level:

    PREDICT SUM(sales.quantity, 0, 30, days)
    FOR EACH groups.group_id
    
  4. Predicting demand six weeks into the future:

    PREDICT SUM(sales.quantity, 35, 42, days)
    FOR EACH groups.group_id
    
  5. Filtering out instances where demand is zero:

    PREDICT SUM(sales.quantity, 0, 30, days)
    FOR EACH groups.group_id
    ASSUMING SUM(sales.quantity, 0, 30, days) > 0
    

Deployment Strategy

Demand forecasting models are typically deployed in automated pipelines that refresh predictions on a set cadence (daily, weekly, or monthly). The workflow includes:

  1. Data Preparation: The latest sales and entity data snapshots are refreshed in the data warehouse.

  2. Model Inference: Kumo ingests the data and generates future demand predictions, which are stored in the desired output location.

  3. Data Transformation: If necessary, post-processing is applied, such as ensuring non-negative predictions:

    SELECT item_id,
           CASE
               WHEN TARGET_PRED < 0 THEN 0
               ELSE TARGET_PRED
           END AS TARGET_PRED
    FROM PREDICTIONS
    
  4. Consumption: Predictions are either reviewed by analysts for strategic decision-making or directly integrated into downstream applications for inventory planning, marketing, and logistics.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

# Initialize Kumo client
kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect Data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

sales = kumo.Table.from_source_table(source_table=connector.table("sales"), primary_key="order_id", time_column="timestamp").infer_metadata()

items = kumo.Table.from_source_table(source_table=connector.table("items"), primary_key="item_id").infer_metadata()

groups = kumo.Table.from_source_table(source_table=connector.table("groups"), primary_key="group_id").infer_metadata()

buyers = kumo.Table.from_source_table(source_table=connector.table("buyers"), primary_key="buyer_id").infer_metadata()

events = kumo.Table.from_source_table(source_table=connector.table("events"), primary_key="event_id", time_column="timestamp").infer_metadata()

4. Create graph schema

graph = kumo.Graph(tables={"sales": sales, "items": items, "groups": groups, "buyers": buyers, "events": events}, edges=[
    {"src_table": "sales", "fkey": "item_id", "dst_table": "items"},
    {"src_table": "sales", "fkey": "buyer_id", "dst_table": "buyers"},
    {"src_table": "sales", "fkey": "group_id", "dst_table": "groups"},
    {"src_table": "sales", "fkey": "event_id", "dst_table": "events"}
])
graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(graph=graph, query="""PREDICT SUM(sales.quantity, 0, 30, days) FOR EACH items.item_id""")
pquery.validate(verbose=True)
model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(graph=graph, train_table=pquery.generate_training_table(non_blocking=True), non_blocking=False)
print(f"Training metrics: {training_job.metrics()}")