What data is needed for search ranking?

Kumo connects directly to your existing relational tables: USERS, SEARCHES, CLICKS, PRODUCTS. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

3Ranked Recommendation · Search

Search Ranking

“For each user's search query, which products should rank highest?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

For each user's search query, which products should rank highest?

Default search engines rank by text relevance and popularity. Two users searching "running shoes" see the same results even though one is a trail runner and the other runs on pavement. Click-through rates on search results average 15-20% when they could be 35-50% with personalization. For an ecommerce site doing $1B in GMV, search drives 40% of revenue — a 20% improvement in search conversion is worth $80M annually.

Quick answer

Personalized search ranking re-orders product search results based on each user's purchase history, browsing behavior, and similarity to other users. Two users searching 'running shoes' see different results because one is a trail runner and the other runs on pavement. Graph-based re-ranking improves search conversion by 20-40% over default text-relevance ranking.

Approaches compared

4 ways to solve this problem

1. Text Relevance (BM25 / TF-IDF)

Rank search results by text match between the query and product titles/descriptions. The default for most ecommerce search engines (Elasticsearch, Solr).

Best for

Queries where text relevance is the primary signal and personalization is not needed (e.g., searching for a specific SKU or brand).

Watch out for

Everyone searching 'running shoes' sees the same results. A trail runner and a road runner have completely different intent, but text relevance cannot distinguish them. Click-through rates sit at 15-20% when they could be 35-50%.

2. Popularity-Boosted Search

Boost search results by global popularity metrics: best-selling, highest-rated, most-reviewed. A simple improvement over pure text relevance.

Best for

High-traffic searches where popular products genuinely satisfy most users. Reduces the risk of surfacing obscure low-quality results.

Watch out for

Still not personalized. Popular products are not always the right products for each user. A bestselling road running shoe is irrelevant to a trail runner, regardless of its popularity.

3. Learning to Rank (LambdaMART, RankNet)

Train a ranking model on features: text relevance, click-through rate, conversion rate, price, and user features. The current industry standard for personalized search.

Best for

Teams with ML infrastructure and large click-stream datasets. Captures non-linear feature interactions for ranking.

Watch out for

Requires extensive feature engineering: user-product affinity scores, category preference vectors, price sensitivity bins. Each new signal (return rate, browse depth, wishlist data) requires a new feature pipeline. The model treats each user-product pair independently, missing cross-user click patterns.

4. KumoRFM (Graph Neural Networks on Relational Data)

Builds a graph connecting users, searches, clicks, products, and purchases. Re-ranks results by learning from the full relational structure: purchase history, click patterns of similar users, location affinity, and return rates. No feature engineering required.

Best for

Ecommerce sites with rich user behavior data and diverse product catalogs where personalization significantly changes result relevance.

Watch out for

The graph advantage is largest for ambiguous queries ('running shoes,' 'dress') where user intent varies widely. For specific queries ('Nike Air Max 90 size 10'), text relevance alone is usually sufficient.

Key metric: Graph-based search re-ranking improves conversion by 20-40%. RelBench benchmark: 76.71 vs 62.44 for flat baselines, with the largest gains on ambiguous queries where user intent varies.

Why relational data changes the answer

User U001 (outdoor enthusiast, Denver) searches 'running shoes.' Default search returns the same bestselling road running shoes shown to everyone. But the relational graph reveals that U001 bought 3 trail products in 6 months, 67% of users with similar purchase history clicked on the TrailMax Pro GTX, and Denver users have a trail product affinity score of 0.82. The click position bias correction also matters: U001 clicked on P401 at position 3, indicating high intent (they scrolled past 2 other products to click it).

These personalization signals span multiple tables. Purchase history is in PURCHASES. Click patterns require the CLICKS table joined to SEARCHES. Location affinity comes from the USERS table cross-referenced with PRODUCTS. Cross-user click patterns require aggregating behavior across similar users. A learning-to-rank model would need an engineer to pre-compute all of these as features. The graph neural network discovers them automatically and learns how they interact. On the RelBench benchmark, graph-based models score 76.71 vs 62.44 for flat-table baselines. For search ranking specifically, the improvement translates to 20-40% higher search conversion rates because the model surfaces the right products for each user, not just the globally popular ones.

Default search ranking is like a concierge who gives every hotel guest the same restaurant recommendation. Personalized graph-based ranking is like a concierge who remembers that you ordered vegan at the hotel restaurant, your travel companion mentioned a gluten allergy, and guests with similar preferences loved the farm-to-table spot on Fifth Street. The same question ('good restaurant nearby?') gets a completely different answer based on relational context.

How KumoRFM solves this

Relational intelligence for true personalization

Kumo re-ranks search results by learning from the full relational graph of user behavior, product attributes, and cross-user click patterns. It discovers that trail runners who search "running shoes" click on different products than road runners — and uses purchase history, return patterns, and graph neighborhood signals to personalize rankings. The model captures that users who bought hydration packs and trail GPS devices should see trail shoes first.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

USERS

user_id	segment	location
U001	outdoor_enthusiast	Denver, CO
U002	casual_fitness	Miami, FL
U003	competitive_runner	Boston, MA

SEARCHES

search_id	user_id	query	timestamp
S001	U001	running shoes	2025-02-20
S002	U002	running shoes	2025-02-20
S003	U003	lightweight trainers	2025-02-21

CLICKS

click_id	search_id	product_id	position	timestamp
CL001	S001	P401	3	2025-02-20
CL002	S001	P408	7	2025-02-20
CL003	S002	P402	1	2025-02-20

PRODUCTS

product_id	name	category	price
P401	TrailMax Pro GTX	Trail Running	159.99
P402	StreetRunner Lite	Road Running	89.99
P408	Summit Ridge Trainer	Trail Running	139.99

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT LIST_DISTINCT(CLICKS.PRODUCT_ID, 0, 7, days)
RANK TOP 20
FOR EACH USERS.USER_ID

Prediction output

Every entity gets a score, updated continuously

USER_ID	CLASS	SCORE	TIMESTAMP
U001	P401	0.91	2025-03-12
U001	P408	0.87	2025-03-12
U002	P402	0.84	2025-03-12

Understand why

Every prediction includes feature attributions — no black boxes

User U001 (outdoor_enthusiast, Denver, CO)

Predicted: P401 (TrailMax Pro GTX) ranked #1 — score 0.91

Top contributing features

Past trail running purchases

3 trail products in 6 months

31% attribution

Graph neighbors clicked P401

67% of similar users clicked

27% attribution

Location affinity (mountain region)

Trail product affinity 0.82

20% attribution

Click position bias correction

Clicked at position 3 (high intent)

14% attribution

Return rate for user segment

0.04 (low returns)

8% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about search ranking

How much does personalized search improve conversion?

Personalized search ranking improves search conversion rates by 20-40% compared to text-relevance-only ranking. For ecommerce sites with $1B+ GMV where search drives 40% of revenue, this translates to $50-80M in incremental annual revenue.

Does personalized search require user login?

Login improves personalization, but it is not required. Session-level behavior (clicks in the current session, products viewed, search refinements) provides enough signal for meaningful personalization. Graph models connect anonymous session behavior to patterns from logged-in users with similar interactions.

How do you handle position bias in search data?

Products shown at position 1 get clicked more often regardless of relevance (position bias). Graph models correct for this by learning the relationship between click position and true relevance. A click at position 7 is a stronger relevance signal than a click at position 1 because the user had to scroll past 6 alternatives.

Can personalized search work for small catalogs?

The benefit scales with catalog size and query ambiguity. A catalog with 50 products does not need much re-ranking. A catalog with 50,000 products where 'running shoes' returns 200 results benefits enormously from personalization. The bigger the result set and the more ambiguous the query, the higher the impact.

Bottom line: 20-40% improvement in search conversion rate. For ecommerce sites with $1B+ GMV, personalized search ranking drives $50-80M in incremental annual revenue.

Related use cases

Explore more personalization use cases

Use Case #1Product RecommendationsLearn more

Use Case #4Next Best OfferLearn more

Use Case #5Email PersonalizationLearn more

Previous#2 Content Personalization

Next#4 Next Best Offer

Topics covered

search ranking AIpersonalized search resultssearch relevance optimizationecommerce search rankinggraph neural network searchKumoRFMpredictive query languagesearch conversion optimizationuser intent predictionproduct search personalizationlearning to rankrelational search signals

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free