Solution Background and Business Value

Shipment delays can cause significant disruptions in supply chains, leading to increased costs, inventory shortages, and dissatisfied customers. These delays impact production schedules, logistics efficiency, and overall business operations, reducing competitiveness in the market.

Using Kumo’s AI-driven predictive models, businesses can analyze historical shipment data to forecast potential delivery delays. These models utilize past shipment records, carrier performance, and external factors like weather and traffic to generate accurate delay predictions, reducing the need for extensive manual feature engineering.

Predicting shipment delays enables better planning and resource allocation, minimizes costs associated with last-minute adjustments, and improves supply chain efficiency. By providing reliable service, businesses can enhance customer satisfaction and maintain a competitive edge in the marketplace.

Data Requirements and Schema

A minimal set of tables can be used to model shipment delays, with additional tables enhancing prediction accuracy. Below is the structure of the data model in Kumo.

Core Tables

  1. Shipments: Contains core shipment details.

    • shipment_id (Primary Key)

    • order_id (Foreign Key referencing Orders)

    • origin (Origin location)

    • destination (Destination location)

    • ship_date (Date of shipment)

    • actual_delivery_date (Date shipment was delivered)

  2. Orders: Contains order information linked to shipments.

    • order_id (Primary Key)

    • customer_id (Foreign Key referencing Customers)

    • order_date (Order placement date)

    • total_amount (Total order value)

    • priority (High, Medium, Low)

  3. Customers: Stores details about customers placing orders.

    • customer_id (Primary Key)

    • Additional customer attributes

  4. Locations: Stores warehouse and delivery location information.

    • location_id (Primary Key)

    • location_name

    • address

    • city, state, zip_code

  5. Shipment_Events: Logs shipment progress and delays.

    • event_id (Primary Key)

    • shipment_id (Foreign Key referencing Shipments)

    • event_date (Event timestamp)

    • event_type (Event status: Picked up, In transit, Delayed, etc.)

    • event_value (Details, including delay duration)

    • location_id (Foreign Key referencing Locations)

Entity Relationship Diagram (ERD)

Predictive Queries

Kumo enables flexible shipment delay forecasting using predictive queries:

  1. Predict if a shipment will be delayed:

    PREDICT Shipments.delay_duration > 0
    FOR EACH Shipments.shipment_id
    
  2. Predict delay for shipments sent in the last 7 days:

    PREDICT Shipments.delay_duration > 0
    FOR EACH Shipment.shipment_id
    WHERE COUNT(Shipment_events.event_type,-7,0,days) > 0
    
  3. Predict delay after reaching the first location:

    PREDICT COUNT(Shipments.event_type == "Delayed", 0, X) > 0
    FOR EACH Shipment.shipment_id
    WHERE COUNT(Shipment_events.event_type == "Shipped" ,-7,0,days) > 0
    AND COUNT(Location.location_id, -7,0, days) == 2
    
  4. Predict delay duration for shipments:

    PREDICT Shipments.delay_duration
    FOR EACH Shipment.shipment_id
    

Building models in Kumo SDK

1. Select tables

import kumoai as kumo

kumo.init(url="<your_kumo_url>", api_key="<your_api_key>")
connector = kumo.S3Connector("s3://your-dataset-bucket/")

shipments = kumo.Table.from_source_table(source_table=connector.table('shipments'), primary_key='shipment_id').infer_metadata()
orders = kumo.Table.from_source_table(source_table=connector.table('orders'), primary_key='order_id').infer_metadata()
customers = kumo.Table.from_source_table(source_table=connector.table('customers'), primary_key='customer_id').infer_metadata()
locations = kumo.Table.from_source_table(source_table=connector.table('locations'), primary_key='location_id').infer_metadata()
shipment_events = kumo.Table.from_source_table(source_table=connector.table('shipment_events'), primary_key='event_id').infer_metadata()

2. Create graph schema

graph = kumo.Graph(
    tables={
        'shipments': shipments,
        'orders': orders,
        'customers': customers,
        'locations': locations,
        'shipment_events': shipment_events,
    },
    edges=[
        dict(src_table='shipments', fkey='order_id', dst_table='orders'),
        dict(src_table='orders', fkey='customer_id', dst_table='customers'),
        dict(src_table='shipment_events', fkey='shipment_id', dst_table='shipments'),
        dict(src_table='shipment_events', fkey='location_id', dst_table='locations'),
    ],
)
graph.validate(verbose=True)

3. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT Shipments.delay_duration FOR EACH Shipments.shipment_id"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(graph=graph, train_table=pquery.generate_training_table(non_blocking=True), non_blocking=False)
print(f"Training metrics: {training_job.metrics()}")

4. Run model

prediction_job = trainer.predict(
    graph=graph,
    prediction_table=pquery.generate_prediction_table(non_blocking=True),
    output_connector=connector,
    output_table_name='shipment_delay_predictions',
    non_blocking=False,
)
print(f'Prediction summary: {prediction_job.summary()}')