What data is needed for eta prediction?

Kumo connects directly to your existing relational tables: SHIPMENTS, CARRIERS, ROUTES, WEATHER, PORT_CONGESTION. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

4Regression · ETA Prediction

ETA Prediction

“When will this shipment arrive?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

When will this shipment arrive?

Inaccurate ETAs ripple through the supply chain: warehouses staff for arrivals that don't come, production lines idle waiting for delayed components, and customers receive wrong delivery promises. Carrier-provided ETAs are 40-60% inaccurate beyond 3 days out. For a logistics company managing 100K shipments per month, reducing ETA error by 30% saves $18M annually in wasted dock labor, expediting fees, and customer penalties.

Quick answer

Graph neural networks predict shipment arrival times by learning how delays propagate across the logistics network: port congestion, weather disruptions, carrier performance patterns, and route-level delay cascading. Unlike carrier-provided ETAs that are 40-60% inaccurate beyond 3 days out, graph-based models reduce ETA error by 30%, saving $18M annually for a logistics company managing 100K monthly shipments.

Approaches compared

4 ways to solve this problem

1. Carrier-provided ETAs

Use the carrier's own estimated arrival time, typically based on scheduled transit times with basic adjustments for known delays.

Best for

Zero engineering effort. Available immediately for every shipment.

Watch out for

40-60% inaccurate beyond 3 days out. Carriers have limited visibility into port congestion, weather routing, and other carriers' delays on shared routes. ETAs are often optimistic because carriers have commercial incentives to under-report delays.

2. Historical transit time models

Calculate average transit time per carrier-route combination from historical data. Use percentile-based estimates for confidence intervals.

Best for

Better baseline than carrier ETAs. Accounts for carrier-specific performance on each route.

Watch out for

Backward-looking. Cannot incorporate real-time signals like current port congestion, in-transit weather events, or vessel position data. Treats each shipment independently.

3. Regression on shipment features (XGBoost, random forest)

Engineer features like 'carrier on-time rate,' 'current port congestion,' 'season,' and 'shipment weight' and train a regression model to predict transit time.

Best for

Incorporates real-time signals that historical averages miss. Good accuracy improvement over carrier ETAs.

Watch out for

Treats each shipment independently. Cannot model delay propagation -- when congestion at Shanghai delays 50 vessels, all downstream ETAs on those routes should shift, but a per-shipment model misses this cascading effect.

4. KumoRFM (relational graph ML)

Connect shipments, carriers, routes, weather, and port congestion into a logistics graph. The GNN learns how delays propagate across the network and compound at intermediate points.

Best for

Captures delay cascading: when port congestion + weather + carrier history compound into a 4-day delay that independent models underestimate. Updates continuously as new signals arrive.

Watch out for

Requires shipment data with carrier, route, and timing information, plus real-time port and weather data feeds. Most impactful for international logistics with multi-leg shipments.

Key metric: 30% reduction in ETA error saves $18M annually for a logistics company managing 100K monthly shipments across international routes.

Why relational data changes the answer

Shipment delays are not independent events. When port congestion in Los Angeles reaches 42 vessels waiting, every shipment arriving at LA in the next week will be delayed. But the delay compounds: a vessel that was already behind schedule due to a Pacific storm hits the congestion queue and adds 4 days, not just the average 3.2-day wait. The carrier's historical pattern on this route shows they lose an additional day during congestion because of their dock assignment priority. These signals live in different tables -- shipments, carriers, routes, weather, port congestion -- and the interactions between them determine the actual arrival time.

Relational models connect the full logistics graph. They learn that SHP501's 4-day delay reflects the compound of LA port congestion (3.2 days), a Pacific storm en route (0.5 days), carrier OceanLine's historical under-performance during congestion events (0.3 days), and the vessel's current behind-schedule position. On the RelBench benchmark, relational models score 76.71 vs 62.44 for single-table approaches. For ETA prediction, that accuracy gap means the difference between warehouse teams prepared for the actual arrival and costly idle time waiting for shipments that show up days late.

Carrier-provided ETAs are like airline departure boards that show the scheduled time even though three flights ahead of yours are delayed and the airport is running at capacity. Graph-based ETA prediction is like the air traffic control system that sees every aircraft in the queue, every weather delay en route, and every runway constraint, producing an honest arrival estimate that accounts for the full picture.

How KumoRFM solves this

Graph-powered intelligence for supply chains

Kumo connects shipments, carriers, routes, weather forecasts, and port congestion into a logistics graph. The GNN learns how delays propagate: when port congestion in Shanghai affects carrier X's transit times on route Y, and how weather patterns at intermediate points compound into final delivery delays. PQL predicts arrival time per shipment, updating continuously as new signals arrive.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

SHIPMENTS

shipment_id	carrier_id	origin	destination	ship_date
SHP501	CAR01	Shanghai	Los Angeles	2025-02-20
SHP502	CAR02	Rotterdam	New York	2025-02-22
SHP503	CAR01	Busan	Seattle	2025-02-25

CARRIERS

carrier_id	name	on_time_rate	avg_delay_days
CAR01	OceanLine Express	72%	2.4
CAR02	Atlantic Cargo	85%	1.1

ROUTES

route_id	origin	destination	avg_transit_days	stops
RT01	Shanghai	Los Angeles	14	0
RT02	Rotterdam	New York	10	1
RT03	Busan	Seattle	11	0

WEATHER

region	date	condition	severity
Pacific	2025-03-02	Storm	Moderate
Atlantic	2025-03-01	Clear	None
Pacific	2025-03-04	Fog	Light

PORT_CONGESTION

port	date	vessels_waiting	avg_wait_days
Los Angeles	2025-03-01	42	3.2
New York	2025-03-01	18	1.0
Seattle	2025-03-01	12	0.5

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT FIRST(SHIPMENTS.actual_arrival, 0, 30, days)
FOR EACH SHIPMENTS.shipment_id

Prediction output

Every entity gets a score, updated continuously

SHIPMENT_ID	CARRIER	ORIGINAL_ETA	PREDICTED_ETA	DELAY_DAYS
SHP501	OceanLine Express	2025-03-06	2025-03-10	+4
SHP502	Atlantic Cargo	2025-03-04	2025-03-05	+1
SHP503	OceanLine Express	2025-03-08	2025-03-09	+1

Understand why

Every prediction includes feature attributions — no black boxes

Shipment SHP501 -- Shanghai to Los Angeles via OceanLine Express

Predicted: Predicted arrival: March 10 (+4 days delay)

Top contributing features

LA port congestion (42 vessels waiting)

3.2 day avg wait

32% attribution

Pacific storm on route (Mar 2)

Moderate severity

26% attribution

Carrier OceanLine historical delay rate

28% late

18% attribution

Current vessel position (behind schedule)

-1.5 days

14% attribution

Fog advisory at destination (Mar 4)

Light

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about eta prediction

How do you predict shipment arrival times accurately?

Model your logistics network as a graph connecting shipments, carriers, routes, weather, and port congestion. The key is capturing delay propagation -- how congestion at one port compounds with weather delays and carrier performance to produce the actual arrival time. Graph models do this naturally; per-shipment regression models cannot.

Why are carrier-provided ETAs so inaccurate?

Carriers have limited visibility beyond their own operations. They don't see port congestion at the destination until close to arrival, cannot predict weather routing changes, and have commercial incentives to provide optimistic estimates. Their ETAs are 40-60% inaccurate beyond 3 days out because they miss the cascading delay signals across the logistics network.

What data do you need for shipment ETA prediction?

Shipment records with carrier, route, origin, destination, and ship date. Carrier performance history (on-time rates, average delays by route). Real-time port congestion data (vessels waiting, average wait times). Weather forecasts along shipping routes. The more connected data you have, the better the model predicts compound delays.

How does port congestion affect shipment ETAs?

Port congestion creates a queue that affects every inbound vessel, but the delay varies by carrier (dock priority), vessel size, and current congestion trajectory (growing or shrinking). Graph models learn these conditional patterns: OceanLine Express at LA during 40+ vessel congestion averages 3.8 days delay, while Atlantic Cargo at the same congestion level averages 2.1 days because of better dock assignments.

What is the ROI of better ETA prediction?

A logistics company managing 100K monthly shipments saves $18M annually by reducing ETA error 30%. The savings come from three sources: reduced dock labor waste ($6M from staffing to actual arrivals), lower expediting fees ($8M from proactive rerouting), and fewer customer penalties ($4M from accurate delivery promises).

Bottom line: A logistics company managing 100K monthly shipments saves $18M per year by reducing ETA error 30%. Kumo's logistics graph captures delay propagation across routes, ports, weather, and carrier patterns that carrier-provided ETAs systematically miss.

Related use cases

Explore more supply chain use cases

Use Case #1Demand SensingLearn more

Use Case #2Supplier Risk ScoringLearn more

Use Case #5Quality PredictionLearn more

Previous#3 Inventory Optimization

Next#5 Quality Prediction

Topics covered

ETA prediction AIshipment arrival predictionsupply chain visibility MLlogistics ETA modelcarrier performance predictionKumoRFM logisticsroute delay forecastingport congestion prediction

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free