Solution Background and Business Value

Cold start recommendation is a common challenge in e-commerce platforms like Amazon and eBay, where new products are constantly introduced. The challenge is even greater in platforms like Eventbrite or Ticketmaster, where items (e.g., events or tickets) are always new and never repeat.

This problem is crucial for businesses where recommendation quality directly impacts user engagement and revenue. Improving cold start recommendations leads to:

  • Better user experiences by ensuring relevant suggestions, even for new items.

  • Higher conversions and sales by surfacing new but relevant products.

  • Stronger user retention by keeping recommendations fresh and personalized.

At Kumo AI, we approach this problem using two key strategies:

  1. Feature-Based Learning: Kumo learns patterns from rich item attributes (e.g., category, brand, price, etc.) to infer relevance for cold start items.

  2. Graph Neural Network (GNN) Propagation: Kumo connects new items to existing ones using shared attributes (e.g., brand, category, location). This allows the system to leverage existing item signals for cold start recommendations.

Data Requirements and Schema

To build an effective cold start recommendation model, we need a structured dataset. Kumo AI allows us to enhance this model by incorporating additional data sources over time.

Core Tables

The three essential tables required for this solution are:

  1. Orders Table

    • Stores interactions between users and items (e.g., purchases, event registrations).

    • Key attributes:

      • customer_id: User identifier.

      • item_id: Item identifier.

      • timestamp: When the interaction occurred.

      • Other optional features: purchase amount, event type.

  2. Customers Table

    • Stores user-related information.

    • Key attributes:

      • customer_id: Unique user identifier.

      • Other optional features: age, location, join date.

  3. Items Table

    • Stores product or event details.

    • Key attributes:

      • item_id: Unique identifier for each item.

      • start_timestamp / end_timestamp: Availability period of the item.

      • Other optional features: category, brand, price, color.

Additional Tables (Optional)

For improved cold start recommendations, consider adding these:

  1. Brands Table: Links items with brands, enabling item-to-item similarity.

  2. Item Metadata Tables: Captures hierarchical relationships between items (e.g., event type, location).

  3. Behavioral Tables: Includes clicks, wishlist adds, reviews, which can be used to enrich recommendations.

Entity Relationship Diagram (ERD)

Predictive Queries

We can train two types of recommendation models to handle cold start cases:

I. Temporal Recommendation (For Personalized Suggestions)

  • Handles cold start items dynamically by leveraging item features.

  • Best when there is a mix of new and existing items.

PREDICT LIST_DISTINCT(orders.item_id, 0, 7) RANK TOP K
FOR EACH customers.customer_id

// If most items are new:
// module: link_prediction_embedding
// handle_new_entities: true
// target_embedding_mode: feature

// If a mix of old and new items:
// module: link_prediction_ranking
// handle_new_entities: false
// target_embedding_mode: fusion

II. Static Link Prediction (For Item-to-Item Recommendations)

  • Uses relationships like brand, category, or event type to recommend similar items.

  • Best when orders table lacks timestamps and connections between items are strong.

PREDICT LIST_DISTINCT(orders.item_id) RANK TOP K
FOR EACH customers.customer_id
// module: link_prediction_embedding

Deployment Strategy

Cold start recommendation deployment is similar to traditional personalized recommendation systems, with some modifications:

  1. Precompute Recommendations for New Users & Items:

    • Generate default item recommendations for users with no history.

    • Assign category-based fallback recommendations if needed.

  2. Embedding-Based Candidate Generation:

    • Generate user and item embeddings daily.

    • Use these embeddings for real-time ranking and filtering.

  3. Integrate with Real-Time Systems:

    • Use precomputed recommendations for new users.

    • Apply real-time reranking for logged-in users with purchase history.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

customers = kumo.Table.from_source_table(
    source_table=connector.table('customers'),
    primary_key='customer_id',
).infer_metadata()

items = kumo.Table.from_source_table(
    source_table=connector.table('items'),
    primary_key='item_id',
).infer_metadata()

orders = kumo.Table.from_source_table(
    source_table=connector.table('orders'),
    time_column='timestamp',
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'customers': customers,
        'items': items,
        'orders': orders,
    },
    edges=[
        dict(src_table='orders', fkey='customer_id', dst_table='customers'),
        dict(src_table='orders', fkey='item_id', dst_table='items'),
    ],
)

graph.validate(verbose=True)

5. Train the Model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT LIST_DISTINCT(orders.item_id, 0, X, days) RANK TOP 50 FOR EACH customers.customer_id"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")