What data is needed for propensity to buy?

Kumo connects directly to your existing relational tables: VISITORS, PAGE_VIEWS, ORDERS. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

4Binary Classification · Purchase Propensity

Propensity to Buy

“Which website visitors will make a purchase in the next 7 days?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Which website visitors will make a purchase in the next 7 days?

E-commerce and SaaS companies drive millions of site visits, but fewer than 3% convert to a purchase. Marketing teams blast the same promotions to everyone, wasting ad spend on visitors who were never going to buy and under-investing in visitors on the verge of purchasing. Without visitor-level propensity scores, personalization engines, ad bidding, and on-site merchandising operate blind.

Quick answer

Propensity-to-buy models predict which website visitors will make a purchase within a defined time window (typically 7 days). The best models learn from browsing sequences, session depth, traffic source, and cross-visitor behavioral patterns rather than simple page-view counts. Visitor-level propensity scores lift conversion rates by 2.5x when used for personalized offers and real-time bid adjustments.

Approaches compared

4 ways to solve this problem

1. Rule-Based Triggers

Fire conversion offers based on rules: 'if cart page visited AND session > 3 min, show popup.' The default in most CRO tools (Optimizely, VWO).

Best for

Simple ecommerce sites with well-defined conversion funnels and limited traffic complexity.

Watch out for

Rules are static and one-dimensional. A visitor who spent 4 minutes reading a blog post triggers the same rule as one who spent 4 minutes comparing products on the pricing page. The intent is completely different, but the rule cannot distinguish them.

2. Logistic Regression on Session Features

Train a simple classifier on session-level features: pages viewed, time on site, traffic source, device type. Score each visitor by conversion probability.

Best for

Teams that want a quick ML baseline. Easy to implement and interpret. Runs fast enough for real-time scoring.

Watch out for

Limited to session-level aggregates. Cannot capture page-view sequences (pricing then cart is different from cart then pricing) or cross-visitor patterns (visitors from the same campaign converting at different rates based on landing page).

3. Gradient Boosted Trees on Behavioral Features

Train XGBoost on hand-crafted behavioral features: page view sequences, scroll depth, click patterns, time on pricing page, cart interactions. The current industry standard.

Best for

Teams with ML engineers and behavioral analytics infrastructure. Good accuracy when features capture the right behavioral signals.

Watch out for

Feature engineering is the bottleneck. Each new behavioral signal (scroll depth, hover patterns, cross-device sessions) requires a new feature pipeline. The model treats each visitor independently, missing cross-visitor signals like 'visitors from this campaign who viewed this product convert at 3x the base rate.'

4. KumoRFM (Graph Neural Networks on Relational Data)

Connects visitors, page views, and orders into a temporal relational graph. Learns from browsing sequences, cross-visitor patterns, and traffic source signals automatically. Scores update continuously as new page views stream in.

Best for

High-traffic ecommerce and SaaS sites where visitor-level propensity scores drive real-time personalization, bid adjustment, and on-site merchandising.

Watch out for

Requires page-level or event-level visitor data with timestamps. If your analytics only tracks session-level aggregates, the model has fewer behavioral signals to work with.

Key metric: Visitor-level propensity scores lift conversion rates by 2.5x. SAP SALT benchmark: 91% accuracy for multi-table relational models vs 75% for single-table approaches.

Why relational data changes the answer

Visitor V001 arrived from paid search, viewed a product page for 45 seconds, spent 120 seconds on the pricing page, and visited the cart. A flat model sees these as four features and scores the conversion probability. But the relational graph captures the sequence and context: the pricing-then-cart sequence is a stronger signal than cart-then-pricing. The 120-second pricing page dwell time is 2.4x the average, indicating serious comparison, not casual browsing. And other visitors from the same paid search campaign who followed this exact sequence converted at 4.1x the base rate.

These cross-visitor patterns are invisible to models that score each visitor independently. The graph neural network propagates information from converted visitors to unconverted ones with similar browsing patterns, traffic sources, and product interests. The WHERE clause in PQL filters to visitors with 3+ page views in the last 7 days, focusing on visitors with enough behavioral signal to score meaningfully. On the SAP SALT benchmark, relational models achieve 91% accuracy vs 75% for single-table models. For ecommerce propensity scoring, the relational advantage translates directly to higher conversion rates: 2.5x lift when propensity scores drive personalized offers, retargeting bid adjustments, and real-time on-site merchandising.

Scoring visitors with session-level features is like a store greeter deciding who is likely to buy based only on how long they have been in the store. A relational model is like a seasoned sales associate who notices the visitor is carrying a competitor's shopping bag (traffic source), went straight to the premium section (page sequence), and resembles the customers who bought during last weekend's promotion (cross-visitor patterns). The time in store matters, but the relational context is what drives the recommendation.

How KumoRFM solves this

Relational intelligence for smarter acquisition

Kumo ingests VISITORS, PAGE_VIEWS, and ORDERS into a temporal relational graph. The model learns sequences and cross-entity patterns — like 'visitors who viewed 5+ pages including pricing, from a paid source, within a session that lasted over 4 minutes' — and combines them with relational signals from other converting visitors. The WHERE clause filters to visitors with recent engagement, ensuring predictions are actionable. Scores update continuously as new page views stream in.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

VISITORS

visitor_id	source	device	first_seen
V001	paid_search	desktop	2025-11-10
V002	organic	mobile	2025-11-11
V003	email	desktop	2025-11-12
V004	direct	tablet	2025-11-12

PAGE_VIEWS

view_id	visitor_id	page_url	duration_sec	timestamp
PV01	V001	/product/shoes	45	2025-11-10
PV02	V001	/pricing	120	2025-11-10
PV03	V001	/cart	30	2025-11-11
PV04	V002	/blog/guide	90	2025-11-11
PV05	V003	/product/jacket	60	2025-11-12
PV06	V003	/pricing	85	2025-11-12

ORDERS

order_id	visitor_id	amount	timestamp
O801	V001	$149	2025-11-12
O802	V003	$225	2025-11-14

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT COUNT(ORDERS.*, 0, 7, days) > 0
FOR EACH VISITORS.VISITOR_ID
WHERE COUNT(PAGE_VIEWS.*, -7, 0, days) > 3

Prediction output

Every entity gets a score, updated continuously

VISITOR_ID	TIMESTAMP	TARGET_PRED	True_PROB
V001	2025-11-10	True	0.92
V002	2025-11-11	False	0.08
V003	2025-11-12	True	0.79
V004	2025-11-12	False	0.15

Understand why

Every prediction includes feature attributions — no black boxes

Visitor V001 — paid_search / desktop

Predicted: True (92% probability)

Top contributing features

Visited cart page within 24 hours of product view

True

32% attribution

Time on pricing page > 90 seconds

120 sec

26% attribution

Source — paid_search (highest-converting channel)

paid_search

20% attribution

3+ page views in last 7 days

3 views

14% attribution

Desktop device (higher AOV segment)

desktop

8% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about propensity to buy

What conversion rate improvement can propensity scoring deliver?

Visitor-level propensity scores typically lift conversion rates by 2-3x when used for personalized offers and real-time bid adjustments. The improvement comes from concentrating marketing spend on high-probability converters and personalizing the experience for visitors on the cusp of purchasing.

Can propensity models work for anonymous visitors?

Yes. Graph models score visitors based on their behavioral patterns (page sequences, session depth, traffic source) without requiring login or identity. The model learns from the full visitor-page-view-order graph, and anonymous visitors with strong behavioral signals can score just as high as known users.

How fast do propensity scores need to update?

For real-time personalization and on-site merchandising, scores should update with each page view (sub-second latency). For retargeting bid adjustment and email personalization, hourly or daily updates are sufficient. The highest ROI comes from real-time scoring because the conversion window for most visitors is minutes, not days.

What is the relationship between propensity scoring and retargeting?

Propensity scores make retargeting dramatically more efficient. Instead of bidding the same amount for all site visitors, you bid high for visitors with 80%+ propensity (who need one more nudge) and bid low or exclude visitors with under 10% propensity (who were never going to convert). This typically reduces retargeting cost-per-acquisition by 40-60%.

Bottom line: Visitor-level propensity scores lift conversion rates by 2.5x when used for personalized offers, retargeting bid adjustments, and on-site merchandising — turning anonymous traffic into attributable revenue.

Related use cases

Explore more acquisition use cases

Use Case #1Lead ScoringLearn more

Use Case #5Marketing AttributionLearn more

Use Case #7Trial-to-Paid ConversionLearn more

Previous#3 Lookalike Modeling

Next#5 Marketing Attribution

Topics covered

propensity to buy modelpurchase prediction AIvisitor conversion predictione-commerce propensity scoringgraph neural network e-commerceKumoRFMrelational deep learningreal-time purchase predictionconversion rate optimizationbehavioral scoringpredictive analytics e-commerce

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free