Cancellation Prediction
“Will this reservation be cancelled?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Will this reservation be cancelled?
Hotels experience 20-40% cancellation rates, with last-minute cancellations (within 48 hours) being the most costly. Empty rooms from cancellations cost $50-200 per night in lost revenue. Overbooking to compensate risks costly walks ($200-500 per walked guest). For a chain with 50,000 rooms at 30% cancellation rate, accurate cancellation prediction enables optimal overbooking that recovers $40-70M annually without increasing walk rates.
Quick answer
Cancellation prediction AI forecasts which hotel reservations will be cancelled by analyzing the booking graph: guest cancellation history, rate type (flexible vs. non-refundable), weather forecast changes, and event status shifts. Hotels experience 20-40% cancellation rates, with last-minute cancellations costing $50-200 per night. Graph-based models predict per-reservation cancellation probability, enabling optimal overbooking that recovers $40-70M annually for a chain with 50,000 rooms without increasing guest walk rates.
Approaches compared
4 ways to solve this problem
1. Historical Cancellation Rates
Apply the property's average cancellation rate to all reservations. If 30% of bookings cancel historically, assume 30% of tonight's bookings will cancel and overbook accordingly.
Best for
Properties with stable cancellation rates and consistent booking mix where aggregate rates are reasonably predictive.
Watch out for
Treats all reservations the same. A non-refundable booking from a Platinum loyalty member has a 5% cancellation rate. A flexible booking from a first-time guest booked 60 days ago has a 55% cancellation rate. Averaging these together leads to either under-overbooking (lost revenue) or over-overbooking (expensive walks). One walked guest costs $200-500 in compensation plus brand damage.
2. Rate-Type Segmentation
Apply different cancellation rates by rate type: flexible rates cancel at 35-45%, non-refundable at 5-10%. Better than aggregate rates.
Best for
Hotels with a clear mix of rate types and enough volume per segment for stable estimates.
Watch out for
Rate type is one factor among many. A flexible booking from a guest with zero cancellation history is very different from a flexible booking by a guest who cancels 50% of reservations. Segmentation by rate type alone misses guest-specific, weather-specific, and event-specific cancellation drivers that account for 40-50% of the variation.
3. Logistic Regression on Booking Features
Build a cancellation model using booking features: rate type, lead time, guest history, day of week, and special requests. The standard ML approach for cancellation prediction.
Best for
Hotels with clean booking data and enough cancellation history to train robust models.
Watch out for
Treats each reservation independently. Cannot capture that a weather forecast change from sunny to storms increases cancellation probability for all leisure bookings on affected dates simultaneously. Also misses event-driven correlations: when a conference is postponed, all associated bookings face elevated cancellation risk. These are network effects that per-reservation models miss.
4. Graph Neural Networks (Kumo's Approach)
Connect reservations, guests, properties, weather, and events into a booking graph. GNNs learn cancellation patterns from the full reservation network, including weather-driven correlated cancellations and event-status impacts.
Best for
Hotels in weather-sensitive or event-driven markets where correlated cancellations (multiple cancellations triggered by the same cause) drive overbooking risk.
Watch out for
Requires weather forecast data and event calendar integration beyond standard PMS data. Best value for hotels with 200+ rooms where overbooking decisions have significant revenue impact.
Key metric: Graph-based cancellation prediction captures correlated cancellation events (weather shifts, event cancellations) that independent models miss. Optimal overbooking based on per-reservation predictions recovers $40-70M annually for a 50,000-room chain without increasing walk rates.
Why relational data changes the answer
Cancellations are not independent events. When the weather forecast for Miami changes from sunny to storms, every leisure reservation for the affected dates faces elevated cancellation risk simultaneously. When the Tech Summit in Orlando is postponed, the 200 bookings associated with that event will cancel in a correlated wave. Per-reservation models see each booking as an independent prediction. They might correctly predict that Reservation RES702 has a 72% cancellation probability, but they cannot predict that 15 other flexible bookings for the same date will also cancel because they share the same weather trigger.
This matters enormously for overbooking. If 15 cancellations hit simultaneously and you have overbooked by 10 based on independent probability estimates, you are fine. But if you overbooked by 5 because the independent model underestimated correlated cancellation risk, you have 10 empty rooms you could have sold. Graph-based models capture these correlations by connecting reservations to shared causes (weather, events, group bookings). SAP's SALT benchmark shows 91% accuracy for graph models vs 63% for gradient-boosted trees on relational tasks. RelBench confirms at 76.71 vs 62.44. For cancellation prediction, this means optimal overbooking that maximizes revenue within walk-rate constraints, recovering $40-70M annually for a 50,000-room chain.
Predicting cancellations independently is like predicting individual employee sick days without knowing there is a flu going around the office. You would underestimate the total because the events are correlated: when one person gets sick, their desk neighbors are more likely to call in sick too. Hotel cancellations work the same way. A weather change or event cancellation is the 'flu' that triggers correlated cancellations across many reservations. Graph-based models track the flu, not just individual symptoms.
How KumoRFM solves this
Graph-powered intelligence for travel and hospitality
Kumo connects reservations, guests, properties, weather, and events into a booking graph. The GNN learns cancellation patterns from the full reservation network: how booking lead time interacts with rate type, how weather forecast changes trigger leisure cancellations, how group bookings create correlated cancellation risk, and how guest history predicts individual cancellation behavior. PQL predicts cancellation probability per reservation, enabling overbooking decisions that maximize revenue within walk-rate constraints.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
RESERVATIONS
| reservation_id | guest_id | room_type | check_in | rate_type |
|---|---|---|---|---|
| RES701 | GST101 | King Standard | 2025-03-14 | Non-refundable |
| RES702 | GST102 | King Deluxe | 2025-03-14 | Flexible |
| RES703 | GST103 | Suite | 2025-03-15 | Flexible |
GUESTS
| guest_id | past_cancellations | total_bookings | loyalty_tier |
|---|---|---|---|
| GST101 | 1 | 15 | Gold |
| GST102 | 4 | 8 | None |
| GST103 | 0 | 22 | Platinum |
PROPERTIES
| property_id | name | market | avg_cancellation_rate |
|---|---|---|---|
| HTL201 | Beachfront Resort | Miami | 32% |
| HTL202 | Convention Center Hotel | Orlando | 28% |
WEATHER
| market | date | forecast | change_from_yesterday |
|---|---|---|---|
| Miami | 2025-03-14 | Rain/Storms | Was: Sunny |
| Miami | 2025-03-15 | Partly Cloudy | No change |
EVENTS
| event_id | market | name | status | date |
|---|---|---|---|---|
| EVT201 | Miami | Beach Music Fest | Confirmed | 2025-03-15 |
| EVT202 | Orlando | Tech Summit | Postponed | 2025-03-14 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(RESERVATIONS.status = 'Cancelled', 0, 14, days) FOR EACH RESERVATIONS.reservation_id
Prediction output
Every entity gets a score, updated continuously
| RESERVATION_ID | GUEST | CHECK_IN | CANCEL_PROB | RISK_TIER |
|---|---|---|---|---|
| RES701 | GST101 | 2025-03-14 | 0.15 | Low |
| RES702 | GST102 | 2025-03-14 | 0.72 | Critical |
| RES703 | GST103 | 2025-03-15 | 0.05 | Low |
Understand why
Every prediction includes feature attributions — no black boxes
Reservation RES702 -- Guest GST102, Flexible King Deluxe Mar 14
Predicted: 72% cancellation probability (Critical)
Top contributing features
Flexible rate type (no cancellation penalty)
Flexible
28% attribution
Guest historical cancellation rate
50% (4 of 8)
25% attribution
Weather forecast change (sunny to storms)
Negative shift
21% attribution
No loyalty tier (low switching cost)
None
15% attribution
Booking lead time (long = higher cancel)
45 days
11% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about cancellation prediction
What is the optimal overbooking rate for hotels?
There is no universal answer because it depends on cancellation rates, walk costs, and revenue per room. The optimal strategy maximizes expected revenue: overbooking enough to fill rooms from cancellations but not so much that walk costs exceed the additional revenue. Graph-based models calculate the optimal overbooking per night per room type based on predicted cancellation volume and correlation. Typical optimal overbooking is 5-15% above physical capacity, but this varies daily based on the specific reservation mix.
How much does walking a guest cost?
Direct costs are $200-500 per walked guest: covering the cost at a comparable hotel, transportation, and compensation (future stay credits, loyalty points). Indirect costs are harder to measure but significant: negative reviews, lost loyalty, and reduced future bookings. For loyalty program members, the lifetime value impact of a walk can be $5,000-$20,000. This is why prediction accuracy matters: optimal overbooking that minimizes walks while maximizing occupancy is a high-value optimization.
How does cancellation prediction handle group bookings?
Group bookings create correlated cancellation risk: when a group cancels, 10-200 rooms cancel simultaneously. Graph-based models represent group bookings as connected nodes, so if the group coordinator's behavior changes (reduced room block, delayed payment), the model increases cancellation probability for the entire block. This prevents the catastrophic scenario where overbooking based on independent estimates leaves the hotel with 50 empty rooms from a surprise group cancellation.
Can cancellation prediction help with rate strategy?
Yes. The model predicts cancellation probability by rate type, which informs the mix of flexible vs. non-refundable rates to offer. If the model predicts high cancellation risk for a specific date (weather-sensitive weekend), the hotel can shift more inventory to non-refundable rates at a discount, reducing cancellation volume while maintaining revenue. This rate-mix optimization is often worth more than overbooking optimization alone.
How quickly does cancellation probability change as check-in approaches?
Cancellation probability evolves significantly in the final 14 days before check-in. At 30 days out, a flexible booking might have a 40% cancellation probability. At 7 days, if the guest has not cancelled and weather looks good, it drops to 15%. At 48 hours, it drops to 5%. The model updates daily, allowing revenue managers to adjust overbooking levels as check-in approaches. The final 48-hour window is where the most expensive cancellations occur and where accurate prediction has the highest per-room value.
Bottom line: A hotel chain with 50,000 rooms recovers $40-70M annually by optimizing overbooking based on per-reservation cancellation predictions. Kumo's booking graph connects guest history, weather shifts, and event status changes to predict cancellations before they happen.
Related use cases
Explore more travel & hospitality use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




