Solution Background and Business Value

Buy-it-again recommendations enhance customer experience by making relevant products easily accessible while also driving business growth. These recommendations:

  • Increase repeat purchases by reminding users of past buys.

  • Boost customer retention by keeping users engaged.

  • Optimize marketing campaigns by personalizing push notifications, in-app recommendations, and emails.

By implementing this approach, businesses ensure they remain top-of-mind for customers, maximizing conversion rates and brand loyalty.

Data Requirements and Schema

To develop an effective Buy-It-Again recommendation model, we need three core tables: Users, Items, and Transactions. While this is the minimum dataset, Kumo AI allows us to enhance the model by incorporating additional signals.

Core Tables

  1. Users Table

    • Stores user details.

    • Key attributes:

      • user_id: Unique identifier (Primary Key).

      • join_timestamp: When the user joined.

      • age, location, other_features: Optional user attributes.

  2. Items Table

    • Stores product details.

    • Key attributes:

      • item_id: Unique identifier (Primary Key).

      • item_name, category: Product metadata.

      • start_timestamp / end_timestamp: Item availability.

      • price, color, other_features: Additional item features.

  3. Transactions Table

    • Stores user purchase history.

    • Key attributes:

      • transaction_id: Unique identifier (Primary Key).

      • user_id: Foreign Key linking to Users.

      • item_id: Foreign Key linking to Items.

      • timestamp: Purchase date.

      • total_amount, payment_method, other_features: Transaction metadata.

Entity Relationship Diagram (ERD)

Predictive Queries

One challenge in buy-it-again recommendations is differentiating repeat purchases from one-time buys. A simple model using only past repeat purchases misses out on important behavioral signals.

We train a general item-to-user recommendation model and apply filters at prediction time, ensuring:

  • The model learns overall user-item affinity.

  • The user receives only buy-it-again recommendations.

PREDICT LIST_DISTINCT(transactions.item_id, 0, X, days) RANK TOP 50
FOR EACH users.user_id
WHERE COUNT(transactions.*, -D, 0, days) >= N

This query:

  • Predicts the top 50 distinct items a user is likely to buy again.

  • Looks at a future X-day window.

  • To avoid empty recommendation sets after filtering, we limit predictions to active users who have made at least N purchases in the last D days.

Filtering Out Newly Introduced Items

To exclude newly launched items (which users haven’t had time to re-purchase), we apply post-processing in SQL:

SELECT *
FROM (
    PREDICTIONS 
    JOIN (
        SELECT entity_id, item_id
        FROM <ORDERS>
        WHERE timestamp <= PREDICTION_ANCHOR_TIME
    ) AS CANDIDATES 
    ON PREDICTIONS.entity_id = CANDIDATES.entity_id 
       AND PREDICTIONS.item_id = CANDIDATES.item_id
);

Building models in Kumo SDK

This problem can be efficiently solved using Kumo AI, which simplifies ML modeling on relational data.

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Create a Connector for Data Storage

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

users = kumo.Table.from_source_table(
    source_table=connector.table('users'),
    primary_key='user_id',
).infer_metadata()

items = kumo.Table.from_source_table(
    source_table=connector.table('items'),
    primary_key='item_id',
).infer_metadata()

transactions = kumo.Table.from_source_table(
    source_table=connector.table('transactions'),
    time_column='timestamp',
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'users': users,
        'items': items,
        'transactions': transactions,
    },
    edges=[
        dict(src_table='transactions', fkey='user_id', dst_table='users'),
        dict(src_table='transactions', fkey='item_id', dst_table='items'),
    ],
)

graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query=(
        "PREDICT LIST_DISTINCT(transactions.item_id, 0, X, days) RANK TOP 50\n"
        "FOR EACH users.user_id"
    ),
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")

6. Run the model

prediction_job = trainer.predict(
    graph=graph,
    prediction_table=pquery.generate_prediction_table(non_blocking=True),
    output_types={'predictions', 'embeddings'},
    output_connector=connector,
    output_table_name='buy_it_again_predictions',
    training_job_id=training_job.job_id,
    non_blocking=False,
)
print(f'Batch prediction job summary: {prediction_job.summary()}')