Search Recommendations
Solution Background and Business Value
Traditional search engines in e-commerce rely on ranking functions like Okapi BM25 or large language models (LLMs) to measure the semantic similarity between queries and product descriptions. While effective, these methods only consider textual relevance and do not leverage historical user behavior to optimize for purchases.
A data-driven approach can enhance search rankings by integrating historical business data. This ensures that the most relevant and high-converting products appear at the top, benefiting both the user experience and business revenue. Personalized search considers factors such as:
-
User preferences and past interactions to tailor results.
-
Conversion likelihood, ranking items higher if they are more likely to be purchased.
-
Business objectives, ensuring high-value items or trending products gain visibility.
Data Requirements and Schema
To build an optimized search recommendation model, we need structured historical data. Kumo AI enables us to extend the model by incorporating additional signals over time.
Core Tables
-
Queries Table
-
Stores user search queries.
-
Key attributes:
-
query_id
: Unique identifier for the search query. -
query
: Text of the search. -
Other optional attributes: query category, external LLM embeddings.
-
-
-
Items Table
-
Stores product information.
-
Key attributes:
-
item_id
: Unique identifier for each item. -
start_timestamp
/end_timestamp
: Availability window. -
Other optional attributes: description, color, category, LLM embeddings, and vision embeddings.
-
-
-
Users Table
-
Stores customer information.
-
Key attributes:
-
user_id
: Unique identifier for each user. -
Other optional features: age, location,
join_timestamp
.
-
-
-
Add to Cart Table
-
Captures search interactions leading to purchases.
-
Key attributes:
-
timestamp
: When the event happened. -
query_id
,item_id
,user_id
: Links interactions to queries, items, and users. -
Other optional attributes: purchase likelihood, engagement time.
-
-
Entity Relationship Diagram (ERD)
Predictive Queries
To recommend the most relevant products for a given search query while also personalizing results, we need two models:
Model 1: Query-Item Relevance
-
Purpose: Identifies the most relevant items for a search query.
-
Training Data: Search queries and add-to-cart interactions.
-
Predictive Query:
Model 2: User-Item Personalization
-
Purpose: Re-ranks the query results based on user preferences.
-
Training Data: User-item interactions from add-to-cart events.
-
Predictive Query:
Deployment Strategy
-
Batch Processing for Query Recommendations:
-
Use Model 1 to generate top product recommendations for each query.
-
Refresh these predictions daily to reflect product availability and buying trends.
-
-
User and Item Embeddings:
-
Use Model 2 to generate user and item embeddings.
-
Refresh embeddings daily to maintain up-to-date personalization.
-
-
Real-Time Query Handling:
-
When a user submits a search query, retrieve the precomputed recommendations.
-
Re-rank results using dot product similarity between user and item embeddings.
-
Use ElasticSearch or similar tools for efficient real-time retrieval and ranking.
-
Handling Anonymous Users
-
If a user is not logged in, use the query recommendations without personalization.
-
The ranking from Model 1 provides a strong default order.
Integrating with Existing Search Engines
- If using Okapi BM25 or another retrieval algorithm, apply Model 2 to re-rank results.
Building models in Kumo SDK
1. Initialize the Kumo SDK
2. Connect data
3. Select tables
4. Create graph schema
5. Train the model