What data is needed for cancellation prediction?

Kumo connects directly to your existing relational tables: RESERVATIONS, GUESTS, PROPERTIES, WEATHER, EVENTS. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

4Binary Classification · Cancellation Prediction

Cancellation Prediction

“Will this reservation be cancelled?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Will this reservation be cancelled?

Hotels experience 20-40% cancellation rates, with last-minute cancellations (within 48 hours) being the most costly. Empty rooms from cancellations cost $50-200 per night in lost revenue. Overbooking to compensate risks costly walks ($200-500 per walked guest). For a chain with 50,000 rooms at 30% cancellation rate, accurate cancellation prediction enables optimal overbooking that recovers $40-70M annually without increasing walk rates.

Quick answer

Cancellation prediction AI forecasts which hotel reservations will be cancelled by analyzing the booking graph: guest cancellation history, rate type (flexible vs. non-refundable), weather forecast changes, and event status shifts. Hotels experience 20-40% cancellation rates, with last-minute cancellations costing $50-200 per night. Graph-based models predict per-reservation cancellation probability, enabling optimal overbooking that recovers $40-70M annually for a chain with 50,000 rooms without increasing guest walk rates.

Approaches compared

4 ways to solve this problem

1. Historical Cancellation Rates

Apply the property's average cancellation rate to all reservations. If 30% of bookings cancel historically, assume 30% of tonight's bookings will cancel and overbook accordingly.

Best for

Properties with stable cancellation rates and consistent booking mix where aggregate rates are reasonably predictive.

Watch out for

Treats all reservations the same. A non-refundable booking from a Platinum loyalty member has a 5% cancellation rate. A flexible booking from a first-time guest booked 60 days ago has a 55% cancellation rate. Averaging these together leads to either under-overbooking (lost revenue) or over-overbooking (expensive walks). One walked guest costs $200-500 in compensation plus brand damage.

2. Rate-Type Segmentation

Apply different cancellation rates by rate type: flexible rates cancel at 35-45%, non-refundable at 5-10%. Better than aggregate rates.

Best for

Hotels with a clear mix of rate types and enough volume per segment for stable estimates.

Watch out for

Rate type is one factor among many. A flexible booking from a guest with zero cancellation history is very different from a flexible booking by a guest who cancels 50% of reservations. Segmentation by rate type alone misses guest-specific, weather-specific, and event-specific cancellation drivers that account for 40-50% of the variation.

3. Logistic Regression on Booking Features

Build a cancellation model using booking features: rate type, lead time, guest history, day of week, and special requests. The standard ML approach for cancellation prediction.

Best for

Hotels with clean booking data and enough cancellation history to train robust models.

Watch out for

Treats each reservation independently. Cannot capture that a weather forecast change from sunny to storms increases cancellation probability for all leisure bookings on affected dates simultaneously. Also misses event-driven correlations: when a conference is postponed, all associated bookings face elevated cancellation risk. These are network effects that per-reservation models miss.

4. Graph Neural Networks (Kumo's Approach)

Connect reservations, guests, properties, weather, and events into a booking graph. GNNs learn cancellation patterns from the full reservation network, including weather-driven correlated cancellations and event-status impacts.

Best for

Hotels in weather-sensitive or event-driven markets where correlated cancellations (multiple cancellations triggered by the same cause) drive overbooking risk.

Watch out for

Requires weather forecast data and event calendar integration beyond standard PMS data. Best value for hotels with 200+ rooms where overbooking decisions have significant revenue impact.

Key metric: Graph-based cancellation prediction captures correlated cancellation events (weather shifts, event cancellations) that independent models miss. Optimal overbooking based on per-reservation predictions recovers $40-70M annually for a 50,000-room chain without increasing walk rates.

Why relational data changes the answer

Cancellations are not independent events. When the weather forecast for Miami changes from sunny to storms, every leisure reservation for the affected dates faces elevated cancellation risk simultaneously. When the Tech Summit in Orlando is postponed, the 200 bookings associated with that event will cancel in a correlated wave. Per-reservation models see each booking as an independent prediction. They might correctly predict that Reservation RES702 has a 72% cancellation probability, but they cannot predict that 15 other flexible bookings for the same date will also cancel because they share the same weather trigger.

This matters enormously for overbooking. If 15 cancellations hit simultaneously and you have overbooked by 10 based on independent probability estimates, you are fine. But if you overbooked by 5 because the independent model underestimated correlated cancellation risk, you have 10 empty rooms you could have sold. Graph-based models capture these correlations by connecting reservations to shared causes (weather, events, group bookings). SAP's SALT benchmark shows 91% accuracy for graph models vs 63% for gradient-boosted trees on relational tasks. RelBench confirms at 76.71 vs 62.44. For cancellation prediction, this means optimal overbooking that maximizes revenue within walk-rate constraints, recovering $40-70M annually for a 50,000-room chain.

Predicting cancellations independently is like predicting individual employee sick days without knowing there is a flu going around the office. You would underestimate the total because the events are correlated: when one person gets sick, their desk neighbors are more likely to call in sick too. Hotel cancellations work the same way. A weather change or event cancellation is the 'flu' that triggers correlated cancellations across many reservations. Graph-based models track the flu, not just individual symptoms.

How KumoRFM solves this

Graph-powered intelligence for travel and hospitality

Kumo connects reservations, guests, properties, weather, and events into a booking graph. The GNN learns cancellation patterns from the full reservation network: how booking lead time interacts with rate type, how weather forecast changes trigger leisure cancellations, how group bookings create correlated cancellation risk, and how guest history predicts individual cancellation behavior. PQL predicts cancellation probability per reservation, enabling overbooking decisions that maximize revenue within walk-rate constraints.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

RESERVATIONS

reservation_id	guest_id	room_type	check_in	rate_type
RES701	GST101	King Standard	2025-03-14	Non-refundable
RES702	GST102	King Deluxe	2025-03-14	Flexible
RES703	GST103	Suite	2025-03-15	Flexible

GUESTS

guest_id	past_cancellations	total_bookings	loyalty_tier
GST101	1	15	Gold
GST102	4	8	None
GST103	0	22	Platinum

PROPERTIES

property_id	name	market	avg_cancellation_rate
HTL201	Beachfront Resort	Miami	32%
HTL202	Convention Center Hotel	Orlando	28%

WEATHER

market	date	forecast	change_from_yesterday
Miami	2025-03-14	Rain/Storms	Was: Sunny
Miami	2025-03-15	Partly Cloudy	No change

EVENTS

event_id	market	name	status	date
EVT201	Miami	Beach Music Fest	Confirmed	2025-03-15
EVT202	Orlando	Tech Summit	Postponed	2025-03-14

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(RESERVATIONS.status = 'Cancelled', 0, 14, days)
FOR EACH RESERVATIONS.reservation_id

Prediction output

Every entity gets a score, updated continuously

RESERVATION_ID	GUEST	CHECK_IN	CANCEL_PROB	RISK_TIER
RES701	GST101	2025-03-14	0.15	Low
RES702	GST102	2025-03-14	0.72	Critical
RES703	GST103	2025-03-15	0.05	Low

Understand why

Every prediction includes feature attributions — no black boxes

Reservation RES702 -- Guest GST102, Flexible King Deluxe Mar 14

Predicted: 72% cancellation probability (Critical)

Top contributing features

Flexible rate type (no cancellation penalty)

Flexible

28% attribution

Guest historical cancellation rate

50% (4 of 8)

25% attribution

Weather forecast change (sunny to storms)

Negative shift

21% attribution

No loyalty tier (low switching cost)

None

15% attribution

Booking lead time (long = higher cancel)

45 days

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about cancellation prediction

What is the optimal overbooking rate for hotels?

There is no universal answer because it depends on cancellation rates, walk costs, and revenue per room. The optimal strategy maximizes expected revenue: overbooking enough to fill rooms from cancellations but not so much that walk costs exceed the additional revenue. Graph-based models calculate the optimal overbooking per night per room type based on predicted cancellation volume and correlation. Typical optimal overbooking is 5-15% above physical capacity, but this varies daily based on the specific reservation mix.

How much does walking a guest cost?

Direct costs are $200-500 per walked guest: covering the cost at a comparable hotel, transportation, and compensation (future stay credits, loyalty points). Indirect costs are harder to measure but significant: negative reviews, lost loyalty, and reduced future bookings. For loyalty program members, the lifetime value impact of a walk can be $5,000-$20,000. This is why prediction accuracy matters: optimal overbooking that minimizes walks while maximizing occupancy is a high-value optimization.

How does cancellation prediction handle group bookings?

Group bookings create correlated cancellation risk: when a group cancels, 10-200 rooms cancel simultaneously. Graph-based models represent group bookings as connected nodes, so if the group coordinator's behavior changes (reduced room block, delayed payment), the model increases cancellation probability for the entire block. This prevents the catastrophic scenario where overbooking based on independent estimates leaves the hotel with 50 empty rooms from a surprise group cancellation.

Can cancellation prediction help with rate strategy?

Yes. The model predicts cancellation probability by rate type, which informs the mix of flexible vs. non-refundable rates to offer. If the model predicts high cancellation risk for a specific date (weather-sensitive weekend), the hotel can shift more inventory to non-refundable rates at a discount, reducing cancellation volume while maintaining revenue. This rate-mix optimization is often worth more than overbooking optimization alone.

How quickly does cancellation probability change as check-in approaches?

Cancellation probability evolves significantly in the final 14 days before check-in. At 30 days out, a flexible booking might have a 40% cancellation probability. At 7 days, if the guest has not cancelled and weather looks good, it drops to 15%. At 48 hours, it drops to 5%. The model updates daily, allowing revenue managers to adjust overbooking levels as check-in approaches. The final 48-hour window is where the most expensive cancellations occur and where accurate prediction has the highest per-room value.

Bottom line: A hotel chain with 50,000 rooms recovers $40-70M annually by optimizing overbooking based on per-reservation cancellation predictions. Kumo's booking graph connects guest history, weather shifts, and event status changes to predict cancellations before they happen.

Related use cases

Explore more travel & hospitality use cases

Use Case #1Dynamic PricingLearn more

Use Case #2Booking PredictionLearn more

Use Case #3Guest PersonalizationLearn more

Previous#3 Guest Personalization

Topics covered

cancellation prediction AIhotel cancellation modelreservation cancellation MLno-show prediction hospitalityoverbooking optimizationKumoRFM cancellationbooking cancellation forecastrevenue protection hotel

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free