Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
4Regression · ETA Prediction

ETA Prediction

When will this shipment arrive?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

When will this shipment arrive?

Inaccurate ETAs ripple through the supply chain: warehouses staff for arrivals that don't come, production lines idle waiting for delayed components, and customers receive wrong delivery promises. Carrier-provided ETAs are 40-60% inaccurate beyond 3 days out. For a logistics company managing 100K shipments per month, reducing ETA error by 30% saves $18M annually in wasted dock labor, expediting fees, and customer penalties.

Quick answer

Graph neural networks predict shipment arrival times by learning how delays propagate across the logistics network: port congestion, weather disruptions, carrier performance patterns, and route-level delay cascading. Unlike carrier-provided ETAs that are 40-60% inaccurate beyond 3 days out, graph-based models reduce ETA error by 30%, saving $18M annually for a logistics company managing 100K monthly shipments.

Approaches compared

4 ways to solve this problem

1. Carrier-provided ETAs

Use the carrier's own estimated arrival time, typically based on scheduled transit times with basic adjustments for known delays.

Best for

Zero engineering effort. Available immediately for every shipment.

Watch out for

40-60% inaccurate beyond 3 days out. Carriers have limited visibility into port congestion, weather routing, and other carriers' delays on shared routes. ETAs are often optimistic because carriers have commercial incentives to under-report delays.

2. Historical transit time models

Calculate average transit time per carrier-route combination from historical data. Use percentile-based estimates for confidence intervals.

Best for

Better baseline than carrier ETAs. Accounts for carrier-specific performance on each route.

Watch out for

Backward-looking. Cannot incorporate real-time signals like current port congestion, in-transit weather events, or vessel position data. Treats each shipment independently.

3. Regression on shipment features (XGBoost, random forest)

Engineer features like 'carrier on-time rate,' 'current port congestion,' 'season,' and 'shipment weight' and train a regression model to predict transit time.

Best for

Incorporates real-time signals that historical averages miss. Good accuracy improvement over carrier ETAs.

Watch out for

Treats each shipment independently. Cannot model delay propagation -- when congestion at Shanghai delays 50 vessels, all downstream ETAs on those routes should shift, but a per-shipment model misses this cascading effect.

4. KumoRFM (relational graph ML)

Connect shipments, carriers, routes, weather, and port congestion into a logistics graph. The GNN learns how delays propagate across the network and compound at intermediate points.

Best for

Captures delay cascading: when port congestion + weather + carrier history compound into a 4-day delay that independent models underestimate. Updates continuously as new signals arrive.

Watch out for

Requires shipment data with carrier, route, and timing information, plus real-time port and weather data feeds. Most impactful for international logistics with multi-leg shipments.

Key metric: 30% reduction in ETA error saves $18M annually for a logistics company managing 100K monthly shipments across international routes.

Why relational data changes the answer

Shipment delays are not independent events. When port congestion in Los Angeles reaches 42 vessels waiting, every shipment arriving at LA in the next week will be delayed. But the delay compounds: a vessel that was already behind schedule due to a Pacific storm hits the congestion queue and adds 4 days, not just the average 3.2-day wait. The carrier's historical pattern on this route shows they lose an additional day during congestion because of their dock assignment priority. These signals live in different tables -- shipments, carriers, routes, weather, port congestion -- and the interactions between them determine the actual arrival time.

Relational models connect the full logistics graph. They learn that SHP501's 4-day delay reflects the compound of LA port congestion (3.2 days), a Pacific storm en route (0.5 days), carrier OceanLine's historical under-performance during congestion events (0.3 days), and the vessel's current behind-schedule position. On the RelBench benchmark, relational models score 76.71 vs 62.44 for single-table approaches. For ETA prediction, that accuracy gap means the difference between warehouse teams prepared for the actual arrival and costly idle time waiting for shipments that show up days late.

Carrier-provided ETAs are like airline departure boards that show the scheduled time even though three flights ahead of yours are delayed and the airport is running at capacity. Graph-based ETA prediction is like the air traffic control system that sees every aircraft in the queue, every weather delay en route, and every runway constraint, producing an honest arrival estimate that accounts for the full picture.

How KumoRFM solves this

Graph-powered intelligence for supply chains

Kumo connects shipments, carriers, routes, weather forecasts, and port congestion into a logistics graph. The GNN learns how delays propagate: when port congestion in Shanghai affects carrier X's transit times on route Y, and how weather patterns at intermediate points compound into final delivery delays. PQL predicts arrival time per shipment, updating continuously as new signals arrive.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

SHIPMENTS

shipment_idcarrier_idorigindestinationship_date
SHP501CAR01ShanghaiLos Angeles2025-02-20
SHP502CAR02RotterdamNew York2025-02-22
SHP503CAR01BusanSeattle2025-02-25

CARRIERS

carrier_idnameon_time_rateavg_delay_days
CAR01OceanLine Express72%2.4
CAR02Atlantic Cargo85%1.1

ROUTES

route_idorigindestinationavg_transit_daysstops
RT01ShanghaiLos Angeles140
RT02RotterdamNew York101
RT03BusanSeattle110

WEATHER

regiondateconditionseverity
Pacific2025-03-02StormModerate
Atlantic2025-03-01ClearNone
Pacific2025-03-04FogLight

PORT_CONGESTION

portdatevessels_waitingavg_wait_days
Los Angeles2025-03-01423.2
New York2025-03-01181.0
Seattle2025-03-01120.5
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT FIRST(SHIPMENTS.actual_arrival, 0, 30, days)
FOR EACH SHIPMENTS.shipment_id
3

Prediction output

Every entity gets a score, updated continuously

SHIPMENT_IDCARRIERORIGINAL_ETAPREDICTED_ETADELAY_DAYS
SHP501OceanLine Express2025-03-062025-03-10+4
SHP502Atlantic Cargo2025-03-042025-03-05+1
SHP503OceanLine Express2025-03-082025-03-09+1
4

Understand why

Every prediction includes feature attributions — no black boxes

Shipment SHP501 -- Shanghai to Los Angeles via OceanLine Express

Predicted: Predicted arrival: March 10 (+4 days delay)

Top contributing features

LA port congestion (42 vessels waiting)

3.2 day avg wait

32% attribution

Pacific storm on route (Mar 2)

Moderate severity

26% attribution

Carrier OceanLine historical delay rate

28% late

18% attribution

Current vessel position (behind schedule)

-1.5 days

14% attribution

Fog advisory at destination (Mar 4)

Light

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about eta prediction

How do you predict shipment arrival times accurately?

Model your logistics network as a graph connecting shipments, carriers, routes, weather, and port congestion. The key is capturing delay propagation -- how congestion at one port compounds with weather delays and carrier performance to produce the actual arrival time. Graph models do this naturally; per-shipment regression models cannot.

Why are carrier-provided ETAs so inaccurate?

Carriers have limited visibility beyond their own operations. They don't see port congestion at the destination until close to arrival, cannot predict weather routing changes, and have commercial incentives to provide optimistic estimates. Their ETAs are 40-60% inaccurate beyond 3 days out because they miss the cascading delay signals across the logistics network.

What data do you need for shipment ETA prediction?

Shipment records with carrier, route, origin, destination, and ship date. Carrier performance history (on-time rates, average delays by route). Real-time port congestion data (vessels waiting, average wait times). Weather forecasts along shipping routes. The more connected data you have, the better the model predicts compound delays.

How does port congestion affect shipment ETAs?

Port congestion creates a queue that affects every inbound vessel, but the delay varies by carrier (dock priority), vessel size, and current congestion trajectory (growing or shrinking). Graph models learn these conditional patterns: OceanLine Express at LA during 40+ vessel congestion averages 3.8 days delay, while Atlantic Cargo at the same congestion level averages 2.1 days because of better dock assignments.

What is the ROI of better ETA prediction?

A logistics company managing 100K monthly shipments saves $18M annually by reducing ETA error 30%. The savings come from three sources: reduced dock labor waste ($6M from staffing to actual arrivals), lower expediting fees ($8M from proactive rerouting), and fewer customer penalties ($4M from accurate delivery promises).

Bottom line: A logistics company managing 100K monthly shipments saves $18M per year by reducing ETA error 30%. Kumo's logistics graph captures delay propagation across routes, ports, weather, and carrier patterns that carrier-provided ETAs systematically miss.

Topics covered

ETA prediction AIshipment arrival predictionsupply chain visibility MLlogistics ETA modelcarrier performance predictionKumoRFM logisticsroute delay forecastingport congestion prediction

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.