Lead Scoring
“Which leads will convert to a paying customer in the next 30 days?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Which leads will convert to a paying customer in the next 30 days?
Sales teams waste 60% of their time on leads that never convert. Current lead scoring uses demographic rules — company size plus job title — missing behavioral and relational signals entirely. The result is bloated pipelines, burned-out SDRs, and missed quota. If you could score leads by actual conversion probability, reps focus on the 20% of leads that drive 80% of pipeline, shortening sales cycles and dramatically improving win rates.
Quick answer
Lead scoring predicts which leads will convert to paying customers within a defined time window. The best models go beyond demographic rules (company size + job title) by learning from behavioral signals, product usage patterns, and relational connections like 'leads whose colleagues at the same company already purchased.' Graph-based lead scoring delivers 3x better accuracy than rule-based approaches.
Approaches compared
4 ways to solve this problem
1. Rule-Based Scoring (BANT / Demographic)
Assign points based on demographic fit: company size, industry, job title, budget, authority, need, timing. The default in most CRMs (Salesforce, HubSpot).
Best for
Teams with no ML capability that need a quick scoring system. Good for initial lead qualification when the ICP is well-defined.
Watch out for
Rules miss behavioral signals entirely. A VP at a Fortune 500 who never opens emails scores higher than a director at a mid-market company who requested a demo, viewed pricing, and invited colleagues to a webinar. Expect 30-40% accuracy.
2. Logistic Regression on CRM Data
Train a logistic regression model on CRM fields: source, industry, engagement score, days in pipeline. Interpretable coefficients help sales understand the scoring logic.
Best for
Teams that want data-driven scoring with interpretable results. A solid step up from rules when the feature set is limited to CRM fields.
Watch out for
Limited to features explicitly stored in the CRM. Misses cross-lead patterns (colleagues at the same company), product usage signals, and temporal sequences (pricing page viewed after webinar attendance).
3. Gradient Boosted Trees (XGBoost) on Enriched Features
Train XGBoost on hand-crafted features from CRM, marketing automation, and product analytics. Captures non-linear relationships and feature interactions.
Best for
Teams with ML engineers and data from multiple systems. Good accuracy when features are well-engineered and regularly refreshed.
Watch out for
Feature engineering is the bottleneck. Each new signal (product usage, support interactions, webinar attendance) requires a new feature pipeline. The model treats each lead independently, missing network effects like 'other leads at the same company are also engaging.'
4. KumoRFM (Graph Neural Networks on Relational Data)
Connects leads, activities, orders, and company relationships into a heterogeneous graph. Automatically discovers signals like 'leads who viewed pricing after a webinar' and 'leads whose colleagues at the same company already purchased.' Zero feature engineering required.
Best for
B2B sales teams with CRM, product usage, and marketing data who want maximum accuracy without building feature pipelines.
Watch out for
Requires activity-level data with timestamps (page views, email interactions, demo requests). If your CRM only has static lead attributes without behavioral data, enrich it first.
Key metric: SAP SALT benchmark: 91% accuracy for multi-table relational models vs 75% for single-table ML and 63% for rule-based scoring. Graph-based leads convert 3.2x more often.
Why relational data changes the answer
Lead L001 (Acme Corp, Finance, webinar source) viewed the pricing page 1 day after signup and requested a demo 2 days later. A flat model sees these as two features: 'viewed pricing = true' and 'requested demo = true.' But the sequence matters: pricing-then-demo is a much stronger signal than demo-then-pricing (which often indicates comparison shopping). The relational graph captures this temporal ordering automatically.
More importantly, L001 is connected to 2 existing customers in the Finance industry. These connections live in the company-to-company relationship graph, not in any single lead's attributes. The GNN propagates signals from converted leads to unconverted ones at the same company, in the same industry, or with similar engagement patterns. On the SAP SALT benchmark, models with access to multi-table relational signals achieve 91% accuracy vs 75% for single-table models. For lead scoring specifically, the relational signals (colleague conversions, industry peer behavior, engagement sequence patterns) are often more predictive than the lead's own demographic attributes. This is why graph-based lead scoring delivers 3.2x higher conversion rates than rule-based scoring.
Scoring leads with demographic rules is like hiring employees based solely on their resume. A relational model also checks their references (connected customers), reviews their work samples (product usage behavior), and sees that their former colleagues who joined your company all became top performers. The resume gets you to the interview; the relational context tells you who to hire.
How KumoRFM solves this
Relational intelligence for smarter acquisition
Kumo builds a heterogeneous graph across your CRM, product usage, support interactions, and marketing touchpoints. Instead of hand-crafted rules, the graph neural network automatically discovers signals like 'leads whose colleagues at the same company already purchased' or 'leads who viewed pricing pages after a webinar.' The model learns from every relationship in your data — not just flat lead attributes — delivering conversion probabilities that are 3x more accurate than rule-based scoring, with zero feature engineering.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
LEADS
| lead_id | company | industry | source | signup_date |
|---|---|---|---|---|
| L001 | Acme Corp | Finance | webinar | 2025-11-01 |
| L002 | Beta Ltd | Retail | organic | 2025-11-03 |
| L003 | Gamma Inc | Healthcare | paid_search | 2025-11-05 |
| L004 | Delta Co | Finance | referral | 2025-11-07 |
ACTIVITIES
| activity_id | lead_id | activity_type | page | timestamp |
|---|---|---|---|---|
| A101 | L001 | page_view | /pricing | 2025-11-02 |
| A102 | L001 | demo_request | /demo | 2025-11-04 |
| A103 | L002 | page_view | /blog | 2025-11-04 |
| A104 | L003 | page_view | /pricing | 2025-11-06 |
| A105 | L004 | email_click | /case-study | 2025-11-08 |
ORDERS
| order_id | lead_id | amount | timestamp |
|---|---|---|---|
| O501 | L001 | $24,000 | 2025-11-15 |
| O502 | L004 | $18,500 | 2025-11-20 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT COUNT(ORDERS.*, 0, 30, days) > 0 FOR EACH LEADS.LEAD_ID
Prediction output
Every entity gets a score, updated continuously
| LEAD_ID | TIMESTAMP | TARGET_PRED | True_PROB |
|---|---|---|---|
| L001 | 2025-11-01 | True | 0.89 |
| L002 | 2025-11-03 | False | 0.12 |
| L003 | 2025-11-05 | True | 0.74 |
| L004 | 2025-11-07 | True | 0.81 |
Understand why
Every prediction includes feature attributions — no black boxes
Lead L001 — Acme Corp
Predicted: True (89% probability)
Top contributing features
Viewed pricing page within 3 days of signup
True
34% attribution
Requested demo after webinar attendance
True
27% attribution
Company industry — Finance
Finance
18% attribution
Lead source — webinar
webinar
13% attribution
Connected to 2 existing customers in same industry
2 connections
8% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about lead scoring
How accurate is AI lead scoring compared to rule-based scoring?
Graph-based lead scoring delivers 3.2x higher conversion rates than rule-based scoring. On the SAP SALT benchmark, multi-table relational models achieve 91% accuracy vs 75% for single-table ML and 63% for rules. The improvement comes from behavioral sequences, cross-lead relationships, and product usage signals that rules cannot capture.
What data do I need for predictive lead scoring?
At minimum: a leads table with attributes and an activities table with timestamped interactions (page views, email clicks, demo requests). High-value additions include product usage data (for PLG companies), existing customer data (to learn from successful conversions), and company/industry relationship data. The more relational tables you connect, the more cross-lead signals the graph discovers.
How often should lead scores be updated?
Real-time or at least daily. A lead who requested a demo 10 minutes ago should score higher than one who did so last week. Batch scoring on weekly cycles misses the urgency signals that drive conversion. Kumo updates scores as new activity data flows in.
Can lead scoring work for product-led growth (PLG) companies?
Absolutely. PLG companies have the richest lead scoring signals because free-tier users generate product usage data before any sales interaction. Features like 'invited a teammate,' 'used the API,' or 'exported data' are strong conversion predictors. Graph models connect usage data to lead data automatically, scoring leads by actual product engagement rather than marketing activity alone.
Bottom line: Kumo-scored leads convert 3.2x more often than rule-based scoring. Sales reps reclaim 60% of prospecting time by focusing on leads the model identifies as high-probability converters.
Related use cases
Explore more acquisition use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




