Solution Background and Business Value

Modern software platforms, especially those with rich feature sets, continuously release new capabilities to enhance customer value. By predicting early adoption for these features, businesses can focus marketing and sales efforts on high-intent users, personalizing outreach, and thus accelerate time-to-value. These predictions not only can improve customer experience and drive product engagement, but they can also unlock faster revenue growth, better resource allocation, and stronger feedback loops for product development. However, tracking adoption across features and identifying which customers are most likely to try them is a complex and non-trivial task. The difficulty lies in processing massive volumes of telemetry data, interpreting diverse customer behaviors, and adapting to the rapid pace of feature releases. Kumo AI is purpose-built to address these challenges, offering an advanced platform designed for behavior-driven modeling at scale. By leveraging **graph-based learning **and rich feature engineering, Kumo uncovers subtle, non-obvious patterns in telemetry data that traditional rule-based or static models fail to capture. While there exist many ways to train such a model, we give an example of how to train a multi-label classification model that predicts if pre-defined features will be used in the next N days.

Data Requirements and Schema:

In order to develop an effective feature adoption prediction model, we need a structured set of data that captures all the relevant user and feature data. While there exists a minimum to the amount of information that can be used to train this model, the addition of relevant information and complexity to the graph will only serve to increase model accuracy. This example of a feature adoption prediction model consists of two main tables: a users table and a product adoption success table, as well as many different telemetry tables. Each entry in the users table represents an entity that the model will be predicting over, and each entry in the product adoption success table represents the first adoption of a feature by a user. The telemetry tables provide the model information about each user, and the more tables and information provided, the better the model will perform. Core Tables:
  1. Users:
    • Each entry represents a user
    • Key attributes:
      • user_id : unique user identifier
      • Optional: User attributes such as region, age, demographics, etc.
  2. Product Success:
    • Stores information about when each user first used a given feature
    • Note: Only tracks the first adoption
    • Key attributes:
      • success_id : unique success identifier
      • date_of_success : date that the user first adopted a feature
      • user_id: the user that used the feature
      • feature: name of the feature used
      • use_case: how the feature was used
      • Optional: Other feature attributes
  3. User Telemetry Tables:
    • Add as many telemetry tables as possible for each user, as increased graph complexity will only add to model accuracy
    • Examples for telemetry tables include user dimensions, user sessions, transactions, etc.
    • Key attributes:
      • user_id : user identifier
      • Attributes related to each table
Entity Relationship Diagram (ERD):

Multi-label Classification vs. Binary Classification:

With this graph structure, there are two main ways to train a feature adoption prediction model: binary classification and multi-label classification. Binary classification models need to be trained individually for each feature and would output whether or not a user would adopt the given feature. Multi-label classification models, on the other hand, can predict whether a user would adopt several features at the same time. This property allows the model to detect signals from other features, adding to its complexity and thus potentially improving its accuracy. Additionally, the multi-label classification model’s ability to capture the entire graph removes the need to train a model for each feature, greatly decreasing the amount of effort needed to bring the model into production.

Predictive Query

This **predictive query **creates the training table to train a multi-label classification model that predicts whether a user will adopt at most K features in the next N days.
PREDICT LIST_DISTINCT(product_success.feature, 0, N, DAYS)
CLASSIFY TOP K
FOR EACH users.user_id