Introducing Kumo Online Serving: Real-time predictions from real-time signals

Learn more
2Binary Classification · Deduplication

Duplicate Detection

For each record in the CRM, is there a duplicate entry in the system?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

WalmartSAPexpediaCatalina Logo

A real-world example

For each record in the CRM, is there a duplicate entry in the system?

Duplicate records inflate customer counts by 15-25%, skew analytics, and cause embarrassing double-outreach. Traditional dedup rules (exact email match) miss variations like "john@acme.com" vs "j.smith@acme-corp.com". Kumo detects duplicates through behavioral overlap — same purchasing patterns, shared addresses, overlapping device fingerprints. Each unmerged duplicate costs $10-30 annually in wasted marketing, and enterprises with millions of records face seven-figure losses.

How KumoRFM solves this

Relational intelligence for identity resolution

Kumo connects CRM records to their transactions, support interactions, and behavioral signals in a unified relational graph. Instead of comparing email strings, Kumo learns that Record R-101 and Record R-204 share the same purchasing cadence, contact the same support agents, and transact with the same merchants. The binary classifier predicts whether each record has a duplicate anywhere in the system — flagging matches that deterministic rules would never catch.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

RECORDS

record_idnameemailcompanysource
R-101John Smithjohn@acme.comAcme Corpwebsite
R-204J. Smithj.smith@acme-corp.comACMEtrade show
R-350Maria Lopezmlopez@bigco.ioBigCo Increferral

MATCH_CANDIDATES

match_idrecord_idcandidate_idsimilarity_scoretimestamp
MC-001R-101R-2040.822025-09-14
MC-002R-350R-6120.742025-09-14
MC-003R-101R-5500.452025-09-15

TRANSACTIONS

txn_idrecord_idamounttimestamp
TXN-8001R-101$1,249.002025-09-10
TXN-8002R-204$1,249.002025-09-10
TXN-8003R-350$487.502025-09-12
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT COUNT(MATCH_CANDIDATES.*
    WHERE MATCH_CANDIDATES.SIMILARITY_SCORE > 0.8,
    0, 30, days) > 0
FOR EACH RECORDS.RECORD_ID
3

Prediction output

Every entity gets a score, updated continuously

RECORD_IDTIMESTAMPTARGET_PREDTrue_PROB
R-1012025-10-01True0.96
R-2042025-10-01True0.96
R-3502025-10-01False0.18
4

Understand why

Every prediction includes feature attributions — no black boxes

Record R-101 (John Smith, Acme Corp)

Predicted: 96% probability of having a duplicate

Top contributing features

Transaction amount overlap with R-204

Exact match

32% attribution

Company name similarity

0.88

24% attribution

Phone number overlap

Same

20% attribution

Behavioral cadence similarity

0.91

14% attribution

Source channel difference

Different

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Bottom line: Eliminate 15-25% duplicate records from your CRM — correcting inflated customer counts, fixing attribution, and saving $1-5M annually in wasted outreach.

Topics covered

duplicate detection AICRM deduplication machine learningrecord deduplicationdata quality AIgraph-based deduplicationKumoRFMrelational deep learningpredictive query languagemaster data managementduplicate record detectionCRM data cleaningautomated deduplication