Solution Background and Business Value

Customer churn prediction helps businesses retain users by identifying those at risk of leaving and taking proactive steps to re-engage them. This is particularly useful for subscription-based services, e-commerce platforms, and streaming services.

By using Kumo AI, businesses can:

  • Train a churn model tailored to their data and customer behavior.

  • Use the Kumo REST API to export predictions to a CRM system.

  • Send targeted notifications via email, SMS, or push notifications to customers likely to churn.

Kumo’s Predictive Query Language (PQL) allows for flexible churn definitions, including:

  • Subscription churn: Predict users likely to cancel within the next 3 months.

  • Inactivity churn: Predict users unlikely to log in within the next 7 days.

  • Purchase churn: Predict users unlikely to make a purchase in the next 30 days.

Data Requirements and Schema

We start with a core set of tables and can add more for better predictions.

Core Tables

  1. Users Table

    • Stores customer information.

    • Key attributes:

      • user_id: Unique identifier for each user.

      • Optional: Signup date, subscription status, location.

  2. Events Table

    • Tracks user activity (e.g., purchases, logins, video streams).

    • Key attributes:

      • user_id: Links to a user.

      • timestamp: Time of event.

      • Optional: Event type (purchase, session start, stream start).

  3. Items Table

    • Contains details about products or content.

    • Key attributes:

      • item_id: Unique identifier.

      • Optional: Product category, price, genre.

Additional Tables (Optional Enhancements)

  • Merchants Table: Details about merchants in a marketplace.

  • Sessions Table: Session start and end times for users.

  • Clicks Table: User interactions with specific items.

  • Reviews Table: User-generated product reviews.

Entity Relationship Diagram (ERD)

Predictive Queries

Churn is defined as users who become inactive within a given timeframe. Below are three different ways to define churn:

1. Predicting Purchase Churn

PREDICT COUNT(events.*, 0, X, days) = 0
FOR EACH users.user_id
WHERE COUNT(events.*, -Y, 0, days) > 0

This predicts users who will not make a purchase in the next X days, given that they have been active in the last Y days.

2. Predicting Streaming/Inactivity Churn

PREDICT COUNT(events.* WHERE events.type = 'stream', 0, X, days) = 0
FOR EACH users.user_id
WHERE COUNT(events.* WHERE events.type = 'session', -Y, 0, days) > 0

This predicts users who will not stream content in the next X days, given that they had active sessions in the last Y days.

3. Predicting Subscription Churn

PREDICT COUNT(events.* WHERE events.type = 'unsubscribe', 0, X, days) > 0
FOR EACH users.user_id
WHERE LAST(users.subscription_status, 0, -Y, days) == 'active'

This predicts users who will unsubscribe in the next X days, given that they were subscribed in the last Y days.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

users = kumo.Table.from_source_table(
    source_table=connector.table('users'),
    primary_key='user_id',
).infer_metadata()

events = kumo.Table.from_source_table(
    source_table=connector.table('events'),
    time_column='timestamp',
).infer_metadata()

items = kumo.Table.from_source_table(
    source_table=connector.table('items'),
    primary_key='item_id',
).infer_metadata()

4. Define graph schema

graph = kumo.Graph(
    tables={
        'users': users,
        'events': events,
        'items': items,
    },
    edges=[
        dict(src_table='events', fkey='user_id', dst_table='users'),
        dict(src_table='events', fkey='item_id', dst_table='items'),
    ],
)

graph.validate(verbose=True)

4. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="""
    PREDICT COUNT(events.*, 0, X, days) = 0
    FOR EACH users.user_id
    WHERE COUNT(events.*, -Y, 0, days) > 0
    """
)

pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")

Deployment Strategy

In production, churn prediction models are integrated into automated retention strategies:

  1. Generate churn scores using Kumo.

  2. Filter users based on churn risk and store the scores.

  3. Export churn scores to CRM tools (e.g., Salesforce, Marketo, Braze).

  4. Trigger personalized engagement (e.g., emails, push notifications, discounts).

  5. Automate the process using workflow orchestration tools (e.g., Airflow, Dagster).