Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Churn Prediction: Why Your Model Is Missing the Customers Who Actually Leave

Most churn models predict the customers who were obviously going to leave anyway. The hard cases, the ones where intervention would actually make a difference, are the ones hiding in relational patterns your flat feature table cannot see.

TL;DR

  • 1Most churn models predict the obvious cases (inactive users, failed payments) while missing the ambiguous ones where intervention would actually save revenue.
  • 2The hard churn signals hide in relational patterns: team-level seat attrition, support ticket escalation sequences, product quality propagation across customers.
  • 3On the H&M RelBench task, relational deep learning scored 69.88 AUROC vs 55.21 for LightGBM with manual features, a 26.6% relative improvement from cross-table signals.
  • 4Customer acquisition costs 5-25x more than retention. In gaming, publishers spend $15B annually on acquisition while 75% of new players churn within 24 hours.
  • 5Churn prediction without explanation is useless. Cell-level attribution tells retention teams exactly which tickets, sessions, and account changes drive the risk.

Every churn model has the same dirty secret. It is very good at predicting customers who are already gone. The user who has not logged in for 60 days. The subscriber whose payment failed twice. The player who uninstalled the app. These predictions are correct and useless. By the time the model flags them, there is nothing left to save.

The customers who matter are the ambiguous ones. The subscriber who is still active but whose usage pattern just shifted. The B2B account where three of five seats have gone quiet. The retail customer whose order frequency dropped from weekly to monthly, but whose last order was their largest ever. These are the predictions that drive retention revenue. And most churn models miss them entirely.

What churn prediction is (and is not)

Churn prediction is a classification problem: for each customer, estimate the probability that they will stop being a customer within a defined time window (typically 30, 60, or 90 days). The output is a score between 0 and 1. The business sets a threshold and takes action on customers above it.

The goal is not to predict churn. It is to predict preventable churn. A customer who leaves because they moved to a different country is not preventable. A customer who leaves because a competitor offered a 20% discount and your retention team never reached out is preventable. The model's value is proportional to the number of preventable churners it identifies before they leave.

The financial stakes are not small. Customer acquisition costs 5x to 25x more than retention across most industries. In gaming, publishers spend an estimated $15 billion annually on player acquisition. The industry average shows 75% of new players churn within 24 hours and 90% within 30 days. Every percentage point of early churn prevented has an outsized impact on lifetime value.

Here is what the data looks like for a B2B SaaS company. The churn signal hides across three tables.

accounts

account_idcompanyplanmrrcontract_end
ACCT-201Pinnacle HealthEnterprise$4,2002026-03-01
ACCT-202Vortex MediaBusiness$8902026-01-15
ACCT-203Atlas LogisticsEnterprise$7,8002026-06-01

user_sessions

session_idaccount_iduser_emaildatedurationfeatures_used
SS-01ACCT-201j.chen@pinnacle.com2025-11-0142 min8
SS-02ACCT-201m.wells@pinnacle.com2025-11-0131 min5
SS-03ACCT-201r.patel@pinnacle.com2025-10-154 min1
SS-04ACCT-201k.davis@pinnacle.com2025-09-280 min0
SS-05ACCT-201l.garcia@pinnacle.com2025-09-100 min0
SS-06ACCT-202t.lee@vortex.com2025-11-1018 min3

Highlighted: 2 of 5 Pinnacle Health users have gone completely inactive. A flat model shows 'avg_session_duration = 15 min' for the account. The relational model sees that 40% of seats are dark.

support_tickets

ticket_idaccount_idsubjectprioritystatuscreated
TK-301ACCT-201SSO integration brokenCriticalOpen2025-10-28
TK-302ACCT-201Export feature not workingHighOpen2025-11-02
TK-303ACCT-201Requesting contract reviewMediumOpen2025-11-08
TK-304ACCT-203Add new user seatsLowResolved2025-10-20

Highlighted: Pinnacle Health has 3 open tickets in 11 days, escalating from technical issues to a contract review request. This sequence is a classic pre-churn pattern.

Why flat features miss the hard cases

A typical churn model is trained on a flat feature table. One row per customer. Here is what the data scientist builds from the three tables above.

flat_feature_table (what the churn model sees)

account_idplanmrractive_usersavg_session_mintickets_30ddays_to_renewal
ACCT-201Enterprise$4,200325.73113
ACCT-202Business$890118.0036
ACCT-203Enterprise$7,800541.20173

Highlighted: ACCT-201 shows 3 active users and 25.7 min average session duration. Looks healthy. But the raw data shows 2 of 5 users are completely inactive (0 min sessions), and the 3 open tickets escalate from technical issues to a contract review. The flat table hides both the seat attrition pattern and the ticket escalation sequence.

The problem is what the features cannot express.

Cross-table patterns

A retail customer's churn risk depends not just on their purchase history, but on the products they bought. If those products had high return rates from other customers, or if the brands they preferred just had a quality scandal, the churn risk goes up. But that signal lives 2-3 hops away in the database: customer → orders → products → other customers' returns. No standard aggregation captures this.

Team-level dynamics

In B2B SaaS, churn is rarely an individual decision. It is a team decision. If 3 of 5 users on an account stop logging in, the remaining 2 are at extreme risk, even if their own usage looks healthy. But a per-user feature table cannot express "percentage of my teammates who have gone inactive." That requires traversing the user → account → other users path.

Temporal sequences, not aggregates

"5 orders in 30 days" is a feature. But it does not tell you whether those orders were evenly spaced (healthy cadence) or compressed into the first week followed by three weeks of silence (pre-churn pattern). The aggregate is identical. The sequence is completely different. And the sequence is what predicts churn.

PQL Query

PREDICT accounts.plan = 'Cancelled'
FOR EACH accounts.account_id

The model reads accounts, user_sessions, and support_tickets as a graph. It discovers that Pinnacle Health's 40% inactive seats + 3 escalating tickets + contract review request is a high-risk combination.

Output

account_idchurn_probabilitytop_signal
ACCT-2010.8640% inactive seats, escalating tickets, contract review
ACCT-2020.33Small account, steady single-user engagement
ACCT-2030.07Adding seats, resolved tickets, high engagement

The relational advantage

A relational approach to churn prediction represents the database as a graph. Customers, orders, products, interactions, subscriptions, and support tickets become nodes. Foreign keys become edges. Timestamps establish ordering. The model traverses this graph to build each customer's prediction.

This changes what the model can see.

Product affinity signals

In the H&M dataset, the relational model discovered that customers who subscribed to the fashion newsletter but whose recent purchases were concentrated in sale items had higher churn rates. This is a 3-table pattern (customers → orders → products + customers → newsletter_subscriptions) that aggregation cannot express as a single feature. The model found it automatically by traversing the graph.

Community churn propagation

In gaming and social platforms, churn spreads through social graphs. When a player's guild members leave, the remaining player is more likely to leave. When a user's closest connections on a social platform go inactive, the user follows. A relational model sees this propagation directly: player → guild → other players → activity status. A flat model would need someone to engineer a "percentage of friends who churned" feature, which requires knowing to look for it in the first place.

Behavioral sequence matching

The graph preserves the full temporal sequence of events for each customer. The model can learn that the pattern "3 support tickets in a week, followed by a billing inquiry, followed by a settings change to downgrade" is a churn precursor. It matches this pattern across the customer base without anyone manually defining it as a feature.

Flat feature churn model

  • One row per customer, losing all relational structure
  • Predicts obvious churners (inactive, failed payments)
  • Misses cross-table patterns (product quality, team dynamics)
  • Temporal sequences destroyed by aggregation
  • H&M benchmark: 55.21 AUROC (LightGBM)

Relational churn model

  • Full graph: customers, orders, products, interactions
  • Finds ambiguous churners hiding in relational patterns
  • Traverses 3-4 hop paths automatically
  • Preserves temporal sequences with timestamped edges
  • H&M benchmark: 69.88 AUROC (RDL)

Industry-specific churn dynamics

Churn is not one problem. It manifests differently across industries, and the relational signals that predict it vary accordingly.

Gaming

Gaming has the most extreme churn profile of any industry. Studies consistently show that 75% of new mobile game players churn within 24 hours and 90% within 30 days. Publishers spend an estimated $15 billion annually on player acquisition, making early retention the single highest-leverage prediction problem in the industry.

The relational signals that predict gaming churn include: session duration trajectories (declining vs. stable), social graph density (players with active friends retain better), progression velocity (too fast or too slow both predict churn), and monetization patterns (first-purchase timing and amount are strongly predictive of long-term retention).

Retail and e-commerce

Retail churn is harder to define because there is no subscription to cancel. Churn is the absence of expected behavior: a customer who used to buy monthly has not bought in 90 days. The H&M RelBench task formalizes this as "will this customer make a purchase in the next 30 days?"

The relational signals include: product return rates of purchased items, category concentration (customers diversifying their purchases are more engaged than those narrowing), seasonal pattern alignment (a holiday shopper who misses Black Friday), and price sensitivity trends across orders.

B2B SaaS

SaaS churn is a team sport. The relational signals that matter most are at the account level, not the user level: seat utilization trends, feature adoption breadth, admin activity (declining admin logins are a leading indicator), integration usage (customers with 3+ active integrations churn at half the rate), and contract value trajectory (downgrades predict cancellation).

Financial services

Banking churn is predicted by cross-product relationships: a customer who moves their direct deposit is 6x more likely to close their account within 90 days. A customer who reduces their automatic bill pay relationships is signaling a shift to a competitor. These are relational signals that span the customer → accounts → transactions → payees path.

From prediction to intervention

A churn score without an explanation is a number that the retention team cannot act on. If the model says "this customer has an 82% probability of churning," the next question is always "why?" Without the why, the team cannot design an intervention.

Relational models provide the why automatically. The cell-level attribution traces the prediction back to specific data points: this customer is predicted to churn because their last 3 support tickets were unresolved (support_tickets table, rows 4521-4523), their team utilization dropped from 80% to 40% in the last 30 days (user_sessions table), and the product they rely on most was deprecated in the last release (product_changes table).

That is an intervention playbook, not a probability score. The customer success team knows exactly what to address: resolve the open tickets, set up a migration path for the deprecated feature, and reach out to the inactive team members. The specificity of the explanation determines the quality of the intervention.

The benchmark evidence

The RelBench benchmark includes churn-style tasks across multiple domains. The consistent finding: relational approaches outperform flat-table approaches by wide margins on tasks where cross-table patterns carry signal.

  • H&M churn task: LightGBM 55.21 vs. RDL 69.88 AUROC (26.6% relative improvement)
  • KumoRFM zero-shot on RelBench classification: 76.71 average AUROC vs. 62.44 for LightGBM with manual features
  • KumoRFM fine-tuned: 81.14 average AUROC, a 30% relative improvement over the manual baseline

The gap is not from a better algorithm running on the same data. It is from the same algorithm seeing more data. The relational model consumes the full graph structure. The flat model consumes a shadow of it.

Getting started

If you have a relational database with customer data and you want churn predictions, the path is straightforward. Connect your data warehouse (Snowflake, BigQuery, Databricks, or Redshift). Write a PQL query: "For each customer, what is the probability of churn in the next 30 days?" The foundation model reads your schema, builds the graph, traverses it, and returns predictions with explanations.

There is no feature engineering step. No model training step. No pipeline to build. The model that scores 76.71 AUROC zero-shot on RelBench is the same model that runs on your data. If you need higher accuracy on your specific domain, fine-tuning pushes toward 81+ AUROC and takes minutes, not months.

Your database already contains the signals that predict which customers will leave. The question is whether your churn model can see them.

Frequently asked questions

What is churn prediction?

Churn prediction is the use of machine learning to identify customers who are likely to stop using a product or service within a defined time window. A churn model assigns each customer a probability score (0 to 1), and the business uses a threshold to trigger intervention: targeted offers, proactive outreach, or escalation to customer success. The value of churn prediction is not in predicting who will leave, but in predicting who will leave and can still be saved.

Why do most churn models underperform?

Most churn models are trained on flat feature tables: one row per customer, columns like total_orders, avg_spend, days_since_last_login. These features capture the obvious cases (inactive users, declining spend) but miss the relational signals that predict ambiguous churn. A customer who is still active but whose teammates have churned, or whose product usage pattern matches historical churners, will not show up in flat features. These are the high-value prediction targets.

How does relational data improve churn prediction?

Relational data connects customers to their orders, products, interactions, support tickets, team members, and more. A graph-based model traverses these connections to find multi-hop churn signals: product return patterns, community behavior, support escalation sequences, team-level adoption trends. On the H&M retail dataset, a relational approach scored 69.88 AUROC compared to 55.21 for LightGBM with manual features, a 26% relative improvement driven entirely by relational context.

What is the business impact of improving churn prediction by 10-15 AUROC points?

A 10-15 point AUROC improvement means the model correctly identifies significantly more at-risk customers while generating fewer false positives. In practice, this translates to retention campaigns that target the right people: customers who are actually at risk and can still be saved. For a SaaS company with $100M ARR and 10% annual churn, improving prediction accuracy enough to save even 1% of churners is worth $1M in preserved revenue. For gaming companies spending $15B annually on player acquisition, reducing early churn by a few percentage points has outsized impact.

How fast can I get a churn prediction model running with KumoRFM?

If your data is in a supported warehouse (Snowflake, BigQuery, Databricks, Redshift), you can have churn predictions running in minutes. You connect your data, write a PQL query like 'For each customer, what is the probability of churn in the next 30 days?', and the foundation model generates predictions. There is no feature engineering, no model training, and no pipeline to build. The model reads your relational schema directly and leverages pre-trained patterns from billions of relational data points.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.