Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
July 2023

Churn Prediction with Graph Neural Networks

Why traditional models miss relational signals, how graph-based approaches close the gap, and a practical path from raw data to production churn predictions.

Kumo teamIvaylo Bahtchevanov
01

What Is Churn and Why It Matters

Churn is the loss of a customer, subscriber, or user. The definition varies by industry: a subscription cancellation, an app uninstall, a lapsed purchase pattern, or a deactivated account. What stays constant is the business impact. Acquiring a new customer costs 5 to 25 times more than retaining an existing one, which makes even small improvements in retention disproportionately valuable.

The economics are straightforward. Re-engaging an at-risk customer is significantly cheaper than re-acquiring them after they leave. Converting low-engagement users into loyal, high-value customers through timely intervention can dramatically increase lifetime value. A few percentage points of improvement in retention compounds over time into substantial revenue gains.

Churn takes many forms

Churn is not always a binary event. It exists on a spectrum:

  • Hard churn: explicit actions like cancellations, unsubscribes, account closures, or app uninstalls.
  • Soft churn: gradual disengagement measured by declining sessions, fewer purchases, reduced time spent, or dropping click-through rates.

Both types require prediction, but soft churn is harder to detect because there is no single trigger event. Instead, you must identify patterns of declining activity across multiple signals over time.

The gaming industry: a case study in extreme churn

Nowhere is churn more severe than in gaming. Game developers spend approximately $15 billion annually on player acquisition, yet 75% of players churn within 24 hours and 90% churn within 30 days. The fundamental equation driving game monetization is that cost per install (CPI) must remain below lifetime value (LTV). When the vast majority of players leave before generating meaningful revenue, the math becomes punishing.

This makes gaming one of the most demanding environments for churn prediction. Games with millions of daily active users generate hundreds of millions of contextual data points: in-game actions, social interactions, purchase behaviors, session patterns, and progression metrics. The window for intervention is narrow (often the first few play sessions), and the data is rich but noisy.

02

Traditional Approaches to Churn Prediction

Most churn prediction today relies on tabular ML models trained on hand-engineered features. A data scientist writes SQL queries to aggregate customer behavior into a flat feature table (days since last purchase, average order value, session count over 30 days, support ticket frequency), then trains a gradient-boosted model like XGBoost or LightGBM. This approach works, to a point.

Regression analysis

The simplest approach uses logistic regression or survival analysis on aggregate features. In gaming, this means tracking metrics like sessions per week, in-app purchases, and level progression. The problem: these models break easily when game mechanics change. An update to the onboarding experience, a new feature release, or a seasonal event can invalidate the entire model. Regression requires a very nuanced understanding of player behavior, and the features must be manually recalibrated whenever the product evolves.

Deep learning on flat features

More sophisticated teams use deep learning models (LSTMs, attention networks) on sequences of user events. These can capture temporal patterns better than static features, but they still rely on handcrafted feature pipelines. Creating these features is time-intensive, introduces bias from the feature designer's assumptions, and is difficult to scale when processing millions of daily interactions. Every new use case requires a new set of features and a new round of model training.

Logistic Regression

Simple but brittle

  • +Fast to train and interpret
  • +Low computational cost
  • +Good baseline for small datasets
  • Breaks when product changes
  • Cannot capture non-linear patterns
  • Requires manual feature recalibration

Gradient-Boosted Trees

Current industry standard

  • +Strong on tabular data
  • +Handles missing values well
  • +Feature importance built in
  • Relies on hand-engineered features
  • Misses relational signals across tables
  • Each new task requires a new pipeline

Deep Learning (LSTM/Attention)

Powerful but expensive

  • +Captures temporal sequences
  • +Can model complex interactions
  • +Handles variable-length histories
  • Handcrafted features still required
  • Difficult to scale to millions of users
  • Sparse training data degrades performance

The common limitation

All traditional approaches share the same bottleneck: they operate on a single flat feature table. To get there, you must flatten your relational database (customers, transactions, products, sessions, support tickets) into one row per customer. This flattening process discards the relational structure of the data. The connections between entities, the multi-hop patterns, and the temporal dynamics across tables are compressed into a handful of aggregated numbers.

A data scientist with a Stanford CS Master's degree and five years of experience needs approximately 12.3 hours and 878 lines of code for a single prediction task. And the process restarts from scratch for every new question you want to answer.

03

Why Relational and Graph Approaches Work Better

Customer behavior does not live in a single table. A customer who is about to churn leaves signals scattered across multiple tables: declining transaction frequency in the orders table, reduced session duration in the activity table, unresolved issues in the support table, and perhaps a shift in product preferences visible only through the product catalog. These signals are connected through foreign keys, and their combination tells the full story.

From tables to graphs

Graph Neural Networks (GNNs) represent your data as it actually exists: entities (customers, products, sessions, transactions) connected by relationships (purchased, viewed, contacted support, played with). Instead of flattening this structure, GNNs learn directly from it. Each entity aggregates information from its neighbors, capturing the context of its relationships.

This means a GNN can learn that a customer's churn risk depends not just on their own behavior, but on the behavior of similar customers, the characteristics of the products they buy, and the patterns in their support interactions. These multi-hop signals are exactly what gets lost in feature engineering.

Key advantages of graph-based churn prediction

  • No manual feature engineering: the model learns which relationships and patterns matter directly from the data, eliminating the guesswork of which features to create.
  • Multi-hop pattern discovery: GNNs traverse connections across multiple tables (customer → orders → products → categories), finding combinations that no human would manually encode.
  • Cold-start capability: for new users with limited history, the graph structure provides context through their connections to other entities. A new player's behavior can be interpreted through the lens of similar players who joined through the same channel, played the same first levels, or share demographic attributes.
  • Temporal dynamics: graph-based models can encode the time dimension natively, distinguishing between a customer whose purchases are accelerating versus decelerating, not just counting total purchases.

What the benchmarks show

On the H&M retail dataset (3 tables, 16.6 million rows), a graph-based approach surfaces churn signals that flat models miss entirely. The model identifies that the interaction between low order count, no fashion news subscription, and no club membership drives churn probability. A feature engineer might include each as a separate column, but the three-way combination across columns is what matters. On the RelBench user-churn task for this dataset, graph-based models achieve 69.88 AUROC compared to 55.21 for LightGBM with manual features.

Churn prediction performance across datasets (AUROC, higher is better)
DatasetTaskLightGBM (manual features)Graph-Based Model
rel-amazonuser-churn52.2270.42
rel-amazonitem-churn62.5482.81
rel-hm (H&M)user-churn55.2169.88
04

Churn Prediction in Gaming

Gaming presents unique challenges that make it an ideal proving ground for graph-based churn prediction. The data is massive (millions of daily active users generating hundreds of millions of interactions), the churn rates are extreme (75% in 24 hours), and the intervention windows are measured in hours, not weeks.

Why gaming churn is different

Unlike subscription businesses where churn is a discrete event (cancellation), gaming churn is gradual. Players stop logging in, reduce session length, stop making purchases, or abandon progression. The signals are distributed across gameplay data, social features, economic systems, and engagement metrics. No single feature captures it.

Games also have asymmetric player interactions. A player in a multiplayer game is connected to teammates, opponents, guild members, and friends. Their churn risk is influenced by the activity of these connections. If a player's entire guild goes inactive, their own churn probability spikes, even if their individual metrics look healthy.

Real-world examples

Rovio, the maker of Angry Birds with over 10 million daily active users, uses machine learning to predict churn and dynamically adjust game difficulty. When the model detects a player is likely to churn, the game can reduce difficulty, offer rewards, or trigger social features to re-engage them. This kind of real-time, personalized intervention requires accurate churn predictions delivered at scale.

Researchers have also trained state-of-the-art agents using GNNs for games like StarCraft, demonstrating that graph-based approaches can capture the complex relational dynamics of competitive gaming environments.

The cold-start problem in gaming

New players present the biggest challenge and the biggest opportunity. With no historical behavior, traditional models have nothing to work with. Graph-based models solve this by leveraging the new player's connections: the acquisition channel they came through, the first actions they took, and the behavior patterns of similar players who entered the game in comparable ways.

GNNs can forecast behavior before there is any historical behavior on that user, which is critical in gaming where the first few sessions determine whether a player stays or leaves.

05

How Kumo Approaches Churn Prediction

Kumo enables teams to go from raw relational data to production churn predictions with minimal time-to-value. The platform eliminates the manual feature engineering bottleneck by operating directly on connected data tables through a three-stage process.

1

Connect Data Tables

Link relational tables (customers, transactions, sessions, products) through primary and foreign key relationships. The platform scales to dozens of tables with terabytes of data and tens of billions of rows.

2

Write Predictive Query

Specify the churn prediction task in Kumo's Predictive Query Language (PQL). No feature engineering required. The query defines what you want to predict, the time windows, and the target population.

3

AutoML Pipeline

The system automatically identifies optimal architectures, models, and hyperparameters. Multiple prediction tasks can run in parallel within hours.

4

Analyze and Deploy

Evaluation dashboards show performance metrics, feature contributions, and churn drivers. Predictions deploy to production with ongoing monitoring.

Predictive Query Language for churn

Kumo's PQL lets you define churn predictions as simple queries without writing feature engineering code. Here are practical examples:

Session-based churn

Predict which active users will have zero sessions in the next 90 days:

PREDICT NOT EXISTS(Sessions, *, 0, 90)
WHERE EXISTS(Sessions, *, -90, 0)
FOR EACH Users.ID

Transaction-based churn

Predict which active customers will stop transacting:

PREDICT NOT EXISTS(Trans, *, 0, 90)
WHERE EXISTS(Trans, *, -90, 0)
FOR EACH Users.ID

Scenario testing with interventions

Test the impact of a coupon on churn probability:

PREDICT NOT EXISTS(Trans, *, 0, 90)
WHERE EXISTS(Trans, *, -90, 0)
FOR EACH Users.ID
ASSUMING EXISTS(Coupons, *, 0, 7)

The ASSUMING clause lets you simulate interventions: what would churn look like if we sent this customer a coupon in the next 7 days? This enables teams to estimate the causal impact of retention strategies before deploying them.

06

Understanding What Drives Churn

Predicting churn is only useful if you can act on it. Kumo's platform provides explainability at multiple levels, enabling teams to understand not just who will churn, but why they are at risk.

Feature contribution analysis

After predictions are generated, evaluation dashboards show which features contribute most to churn predictions. The platform identifies specific indicators like transaction frequency, postal code, membership status, session recency, and product category preferences as meaningful churn drivers.

For the H&M retail dataset, the model highlights that the combination of order count + fashion news subscription status + club membership is a primary churn signal. Users with few past orders who also lack a fashion news subscription and active club membership show dramatically higher churn probability. This three-way interaction is precisely the kind of pattern that manual feature engineering misses.

Temporal pattern detection

The platform goes beyond simple recency metrics. Rather than just tracking “days since last order,” it captures the full temporal pattern: are purchases accelerating, decelerating, or clustering? This is possible because the model operates on raw transaction sequences, not pre-aggregated numbers.

Segment-level and individual-level analysis

Teams can compare performance against baseline models, identify patterns across user segments, and diagnose root causes of churn such as feature gaps, expectation misalignment, or competitive pressure. The analysis enables targeted retention strategies: different interventions for different churn drivers.

From prediction to action

The practical applications extend beyond basic churn scoring:

  • Outreach optimization: determine which customers respond best to which retention offers.
  • Segment response prediction: forecast how different user segments will react to product changes.
  • High-value customer identification: prioritize retention efforts on customers with the highest potential LTV.
  • In gaming: personalization, ARPU maximization, LTV forecasting, difficulty adjustment, and NPS improvement, all from the same platform and data graph.
07

Practical Implementation

Deploying graph-based churn prediction in production requires connecting your data, defining the prediction task, and configuring the AutoML pipeline. Here is what the process looks like with Kumo.

Step 1: Graph construction

Connect your relational data tables (customer information, transactions, subscriptions, product catalogs, session logs) through primary and foreign key relationships. Kumo's platform maintains and updates the graph automatically as new data arrives. The system scales to dozens of tables containing terabytes of data and tens of billions of rows.

Step 2: Define predictions with PQL

Write predictive queries that specify the business question. No feature engineering, no SQL aggregation pipelines, no manual feature stores. The platform handles feature discovery automatically.

Step 3: Configure AutoML

The AutoML pipeline optimizes model architecture, hyperparameters, and ensemble strategies. A typical configuration includes:

Example AutoML configuration for churn prediction
ParameterValue
Trial runs8
Search strategyBayesian
Evaluation metricsAUROC, AUPRC, Precision@100, Recall@100
Model ensembles3
Max training epochs100
Steps per epochUp to 2,000

Step 4: Evaluate and iterate

Review the evaluation dashboard to compare performance against baselines, analyze feature contributions, and validate predictions across user segments. The platform supports multiple prediction tasks in parallel, so you can test different churn definitions (session-based vs. transaction-based) or different time windows (30-day vs. 90-day) simultaneously.

Step 5: Deploy and monitor

Kumo deploys natively on Snowflake (Snowpark Container Services) and Databricks (Lakehouse App). The Python SDK provides customization options for advanced users who want to integrate predictions into existing workflows.

Beyond churn: a reusable platform

Once your data is connected, the same graph supports multiple prediction tasks without rebuilding pipelines. Churn prediction, recommendation, LTV forecasting, fraud detection, demand prediction, and personalization all operate on the same relational structure. In gaming, this means a single platform handles player churn, in-game recommendations, dynamic pricing, matchmaking optimization, and engagement scoring.

Try KumoRFM on your own data

Zero-shot predictions are free. Fine-tuning is available with a trial.