1Classification · Fraud Detection

Claims Fraud Detection

“Is this claim fraudulent?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Is this claim fraudulent?

Insurance fraud costs the US industry $80B+ annually (FBI), with 10% of all P&C claims containing some element of fraud (Coalition Against Insurance Fraud). Special Investigation Units (SIUs) can only investigate 5-10% of flagged claims, and legacy rules-based systems generate 80-90% false positives, burying real fraud in noise. Organized fraud rings are particularly hard to detect because they coordinate across multiple policies, claimants, providers, and repair shops. A single fraud ring can cost an insurer $5-20M before detection.

Quick answer

Graph-based AI detects insurance claims fraud by mapping hidden connections between claimants, providers, repair shops, and adjusters. Unlike rules-based systems that flag individual claims in isolation, relational models spot coordinated fraud rings and soft fraud patterns by analyzing multi-hop relationships across the entire claims network. Top implementations reduce fraud losses by 30-50% while cutting false positives by 40%.

Approaches compared

4 ways to solve this problem

1. Rules-Based Flagging

Hardcoded business rules that flag claims matching known fraud patterns (e.g., claim filed within 30 days of policy inception, amount exceeds threshold). Easy to implement and explain, but rigid and predictable.

Best for

Known, well-defined fraud patterns where speed matters more than coverage.

Watch out for

Fraudsters learn the rules fast. False-positive rates hit 80-90%, burying real fraud in noise. New fraud schemes go undetected until someone writes a new rule.

2. Traditional ML (XGBoost / Logistic Regression)

Gradient-boosted trees or logistic regression trained on claim-level features like amount, timing, claimant history, and provider stats. Better than rules because it learns from data, but still operates on flat, single-claim feature vectors.

Best for

Soft fraud on individual claims where the signal is in the claim itself (inflated estimates, suspicious timing).

Watch out for

Cannot see multi-hop connections. A fraud ring where three claimants share a phone number and a body shop requires manual feature engineering that rarely happens.

3. Social Network Analysis (SNA)

Graph analytics that map connections between entities and compute centrality, clustering, and community-detection metrics. Good at visualizing suspicious networks for SIU investigators.

Best for

Exploratory fraud ring investigations where analysts need visual maps of connected entities.

Watch out for

SNA produces graph features but does not predict fraud directly. You still need a separate model, and the handoff between graph analysis and prediction introduces gaps.

4. Graph Neural Networks (Kumo's Approach)

End-to-end relational learning that builds a heterogeneous graph across claims, claimants, providers, and shared attributes, then learns fraud patterns directly from the graph structure. No manual feature engineering required.

Best for

Detecting both organized fraud rings and soft fraud at scale. The model learns multi-hop patterns (shared phone + shared provider + timing cluster) automatically.

Watch out for

Requires relational data to be connected. If your claims data lives in completely siloed systems with no joinable keys, you need data integration first.

Key metric: Insurance fraud costs the US industry $80B+ annually (FBI). Graph-based detection reduces false-positive investigation rates by 40% while catching 30-50% more confirmed fraud.

Why relational data changes the answer

Flat-table fraud models treat each claim as an independent row. They can see that Claim CLM-9201 has a high amount and was filed shortly after policy inception, but they cannot see that the claimant shares a phone number with two other recent claimants, all three used the same body shop, and the repair estimates follow an identical dollar pattern. These multi-hop connections are the signature of organized fraud rings, and they are invisible to any model that operates on a single-row feature vector. Manual feature engineering can capture some of these signals (e.g., 'count of claims sharing this phone number'), but it requires a data scientist to anticipate every possible connection pattern in advance.

Relational learning solves this by operating directly on the connected graph. The model walks from a claim to its claimant, to other claims by that claimant, to the providers on those claims, to other claimants using those providers, and learns which connection patterns predict fraud. This is how real investigators think: they pull threads and follow connections. Graph neural networks automate that process at scale, examining millions of connection patterns simultaneously. The result is fraud detection that catches rings months earlier, with 40% fewer false positives, because the model has seen the full picture instead of a flattened summary.

Detecting fraud from a flat claims table is like trying to find a crime ring by reading individual police reports in alphabetical order. Each report looks like an isolated incident. But pin those reports on a map, draw lines between shared addresses, phone numbers, and associates, and the ring becomes obvious. Graph-based fraud detection is the digital version of that pin-and-string board that every detective show features, except it processes millions of connections in seconds.

How KumoRFM solves this

Relational intelligence built for insurance data

Kumo connects claims, policies, claimants, providers, repair facilities, adjusters, and geographic data into a single relational graph. The model detects that Claim CLM-9201 involves a claimant who shares a phone number with two other recent claimants, all three used the same body shop, and the repair estimates follow an identical pattern. These multi-hop connections reveal fraud rings invisible to single-claim analysis. The graph also catches soft fraud: Claim CLM-9205 has inflated damage estimates based on the vehicle's age, repair-shop pricing patterns, and historical claim amounts for similar incidents.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

CLAIMS

claim_id	policy_id	type	amount	loss_date	filed_date
CLM-9201	POL-4401	Auto Collision	$12,400	2025-09-01	2025-09-03
CLM-9205	POL-4418	Auto Collision	$8,900	2025-09-05	2025-09-06
CLM-9210	POL-4425	Property Fire	$45,000	2025-09-08	2025-09-10

CLAIMANTS

claimant_id	name	phone	address	claims_12mo
CL-801	Michael Torres	555-0142	88 Pine St	3
CL-802	Lisa Chen	555-0142	220 Oak Ave	2
CL-803	James Wilson	555-0199	88 Pine St	1

PROVIDERS

provider_id	name	type	avg_estimate	claims_volume
PRV-101	QuickFix Auto Body	Repair Shop	$11,800	42/mo
PRV-102	City Auto Repair	Repair Shop	$7,200	28/mo
PRV-103	Dr. Smith Chiro	Medical Provider	$4,500	65/mo

CLAIM_NETWORK

claim_id	claimant_id	provider_id	adjuster_id	shared_attributes
CLM-9201	CL-801	PRV-101	ADJ-05	Phone, Provider
CLM-9202	CL-802	PRV-101	ADJ-05	Phone, Provider
CLM-9203	CL-803	PRV-101	ADJ-12	Address, Provider

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(CLAIMS.FRAUD_CONFIRMED = 'True', 0, 0, days)
FOR EACH CLAIMS.CLAIM_ID
WHERE CLAIMS.AMOUNT > 5000

Prediction output

Every entity gets a score, updated continuously

CLAIM_ID	AMOUNT	FRAUD_SCORE	RING_DETECTED	SIU_PRIORITY
CLM-9201	$12,400	0.91	Ring-A (3 claims)	Critical
CLM-9205	$8,900	0.62	None	High
CLM-9210	$45,000	0.15	None	Low

Understand why

Every prediction includes feature attributions — no black boxes

Claim CLM-9201 (Auto Collision, $12,400)

Predicted: 91% fraud probability, Ring-A detected

Top contributing features

Shared phone with other claimants

2 matches

28% attribution

Common repair shop (high-volume)

PRV-101, 42/mo

24% attribution

Shared address pattern

88 Pine St

20% attribution

Claim timing cluster

3 in 10 days

17% attribution

Estimate vs vehicle value ratio

68% of ACV

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about claims fraud detection

How does AI detect insurance fraud rings?

AI detects fraud rings by building a graph of connections between claimants, providers, repair shops, and shared attributes (phone numbers, addresses, bank accounts). When multiple entities cluster together with unusual overlap, the model flags the entire cluster for investigation. This catches coordinated schemes that look normal at the individual claim level.

What is the false positive rate for AI fraud detection in insurance?

Rules-based systems typically generate 80-90% false positives. Traditional ML models bring that down to 50-60%. Graph-based approaches like Kumo reduce false positives to 35-45% because they use relationship context to distinguish genuinely connected claims from coincidental similarities.

How long does it take to detect insurance fraud with machine learning?

Traditional audit-based detection takes 12-24 months. Rules-based systems catch known patterns within days but miss new schemes. Graph-based ML models can detect emerging fraud rings within 2-4 weeks of the first coordinated claim, because the connection pattern becomes visible before enough individual claims accumulate to trigger traditional thresholds.

Can AI detect soft fraud in insurance claims?

Yes. Soft fraud (inflated damages, exaggerated injuries, added items to legitimate claims) accounts for the majority of fraud losses. AI models detect soft fraud by comparing claim details against historical patterns for the same vehicle type, repair shop, injury type, and geographic area. When an estimate is 2-3x the expected amount for the circumstances, the model flags it.

Bottom line: Reduce fraud losses by 30-50% and cut SIU false-positive rates by 40%, saving $40-80M annually for a top-20 P&C insurer.

Related use cases

Explore more insurance use cases

Use Case #7Provider Fraud DetectionLearn more

Use Case #5Subrogation RecoveryLearn more

Use Case #4Claims Severity PredictionLearn more

Next#2 Underwriting Risk Assessment

Topics covered

insurance claims fraud detectionclaims fraud AIinsurance fraud analyticsSIU optimizationgraph neural network insurance fraudKumoRFMrelational deep learning insurancefraudulent claims predictioninsurance fraud ring detectionclaims investigation AI

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free