Solution Background and Business Value

Item-to-item recommendations help users discover products similar to what they are currently viewing or have purchased. This technique powers features like “You might also like” and “Frequently bought together” on e-commerce platforms. These recommendations:

  • Increase customer engagement by surfacing relevant products.

  • Boost conversion rates by promoting co-purchased or similar items.

  • Enhance personalization by learning from user behavior patterns.

Kumo AI can generate rich item embeddings optimized for co-purchases, not just semantic similarity (e.g., recommending cereal when someone buys milk instead of just different types of milk). It balances signals from purchases, views, and interactions to improve recommendations beyond simple text matching.

Data Requirements and Schema

To train an item-to-item recommendation model, we need structured data that captures relationships between items based on user behavior.

Core Tables

  1. Purchase Item Pairs Table

    • Captures item co-occurrence based on user behavior (e.g., items bought in the same order or session).

    • Key attributes:

      • item_id_lhs: The primary item (left-hand side).

      • item_id_rhs: The similar/co-purchased item (right-hand side).

      • Optional: Purchase session, timestamps, user interactions.

  2. Items Table (LHS & RHS)

    • Represents unique items in the dataset (even if LHS and RHS contain the same data, Kumo requires them as separate tables).

    • Key attributes:

      • item_id: Unique product identifier.

      • Optional: Category, price, brand, description, image embeddings.

  3. Users Table (Optional)

    • Stores customer details that can improve recommendations.

    • Key attributes:

      • user_id: Unique identifier.

      • Optional: Location, demographics, past purchase patterns.

Entity Relationship Diagram (ERD)

Predictive Queries

The following predictive query ranks the top 20 most relevant items for each product:

PREDICT LIST_DISTINCT(purchase_item_pairs.item_id_rhs)
RANK TOP 20
FOR EACH items_lhs.item_id_lhs

Time-Based Ranking

If you want to factor in temporal dynamics (e.g., trends over the last N days):

PREDICT LIST_DISTINCT(purchase_item_pairs.item_id_rhs, 0, N, days)
RANK TOP 20
FOR EACH items_lhs.item_id_lhs

Handling Cold-Start Items

For businesses where new items frequently appear, consider enabling Kumo’s cold-start handling in the model plan:

handle_new_target_entities: true

This ensures the model learns from item attributes when no purchase history exists.

Deployment Strategy

Batch Recommendations in a Key-Value Store

A common way to serve item recommendations is to precompute them and store them in a key-value store for low-latency retrieval:

ITEM_ID_1 : {ITEM_ID_A, ITEM_ID_B, ..., ITEM_ID_N} 
# LHS_ITEM : {Recommended items for LHS_ITEM}

At runtime, the application retrieves recommendations instantly for a given product.

Using Embeddings for Similarity Scores

Instead of retrieving static predictions, we can use dot product similarity between embeddings to dynamically compute item similarity scores. This enables:

  • Personalized recommendations based on the user’s session.

  • Real-time filtering to exclude items already viewed by the user.

Example workflow:

  1. Cache top-10 recommendations for each item in a key-value store.

  2. Store LHS and RHS embeddings in a vector database.

  3. During a session:

    • Show cached recommendations first.

    • If the user exhausts cached items, backfill recommendations dynamically using dot product similarity.

This hybrid approach balances speed and flexibility.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Create a Connector for Data Storage

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

items_lhs = kumo.Table.from_source_table(
    source_table=connector.table('items_lhs'),
    primary_key='item_id_lhs',
).infer_metadata()

items_rhs = kumo.Table.from_source_table(
    source_table=connector.table('items_rhs'),
    primary_key='item_id_rhs',
).infer_metadata()

purchase_item_pairs = kumo.Table.from_source_table(
    source_table=connector.table('purchase_item_pairs'),
    time_column=None,
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'items_lhs': items_lhs,
        'items_rhs': items_rhs,
        'purchase_item_pairs': purchase_item_pairs,
    },
    edges=[
        dict(src_table='purchase_item_pairs', fkey='item_id_lhs', dst_table='items_lhs'),
        dict(src_table='purchase_item_pairs', fkey='item_id_rhs', dst_table='items_rhs'),
    ],
)

graph.validate(verbose=True)

5. Train the Model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT LIST_DISTINCT(purchase_item_pairs.item_id_rhs) RANK TOP 20 FOR EACH items_lhs.item_id_lhs"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")