Claims Fraud Detection
“Is this claim fraudulent?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Is this claim fraudulent?
Insurance fraud costs the US industry $80B+ annually (FBI), with 10% of all P&C claims containing some element of fraud (Coalition Against Insurance Fraud). Special Investigation Units (SIUs) can only investigate 5-10% of flagged claims, and legacy rules-based systems generate 80-90% false positives, burying real fraud in noise. Organized fraud rings are particularly hard to detect because they coordinate across multiple policies, claimants, providers, and repair shops. A single fraud ring can cost an insurer $5-20M before detection.
Quick answer
Graph-based AI detects insurance claims fraud by mapping hidden connections between claimants, providers, repair shops, and adjusters. Unlike rules-based systems that flag individual claims in isolation, relational models spot coordinated fraud rings and soft fraud patterns by analyzing multi-hop relationships across the entire claims network. Top implementations reduce fraud losses by 30-50% while cutting false positives by 40%.
Approaches compared
4 ways to solve this problem
1. Rules-Based Flagging
Hardcoded business rules that flag claims matching known fraud patterns (e.g., claim filed within 30 days of policy inception, amount exceeds threshold). Easy to implement and explain, but rigid and predictable.
Best for
Known, well-defined fraud patterns where speed matters more than coverage.
Watch out for
Fraudsters learn the rules fast. False-positive rates hit 80-90%, burying real fraud in noise. New fraud schemes go undetected until someone writes a new rule.
2. Traditional ML (XGBoost / Logistic Regression)
Gradient-boosted trees or logistic regression trained on claim-level features like amount, timing, claimant history, and provider stats. Better than rules because it learns from data, but still operates on flat, single-claim feature vectors.
Best for
Soft fraud on individual claims where the signal is in the claim itself (inflated estimates, suspicious timing).
Watch out for
Cannot see multi-hop connections. A fraud ring where three claimants share a phone number and a body shop requires manual feature engineering that rarely happens.
3. Social Network Analysis (SNA)
Graph analytics that map connections between entities and compute centrality, clustering, and community-detection metrics. Good at visualizing suspicious networks for SIU investigators.
Best for
Exploratory fraud ring investigations where analysts need visual maps of connected entities.
Watch out for
SNA produces graph features but does not predict fraud directly. You still need a separate model, and the handoff between graph analysis and prediction introduces gaps.
4. Graph Neural Networks (Kumo's Approach)
End-to-end relational learning that builds a heterogeneous graph across claims, claimants, providers, and shared attributes, then learns fraud patterns directly from the graph structure. No manual feature engineering required.
Best for
Detecting both organized fraud rings and soft fraud at scale. The model learns multi-hop patterns (shared phone + shared provider + timing cluster) automatically.
Watch out for
Requires relational data to be connected. If your claims data lives in completely siloed systems with no joinable keys, you need data integration first.
Key metric: Insurance fraud costs the US industry $80B+ annually (FBI). Graph-based detection reduces false-positive investigation rates by 40% while catching 30-50% more confirmed fraud.
Why relational data changes the answer
Flat-table fraud models treat each claim as an independent row. They can see that Claim CLM-9201 has a high amount and was filed shortly after policy inception, but they cannot see that the claimant shares a phone number with two other recent claimants, all three used the same body shop, and the repair estimates follow an identical dollar pattern. These multi-hop connections are the signature of organized fraud rings, and they are invisible to any model that operates on a single-row feature vector. Manual feature engineering can capture some of these signals (e.g., 'count of claims sharing this phone number'), but it requires a data scientist to anticipate every possible connection pattern in advance.
Relational learning solves this by operating directly on the connected graph. The model walks from a claim to its claimant, to other claims by that claimant, to the providers on those claims, to other claimants using those providers, and learns which connection patterns predict fraud. This is how real investigators think: they pull threads and follow connections. Graph neural networks automate that process at scale, examining millions of connection patterns simultaneously. The result is fraud detection that catches rings months earlier, with 40% fewer false positives, because the model has seen the full picture instead of a flattened summary.
Detecting fraud from a flat claims table is like trying to find a crime ring by reading individual police reports in alphabetical order. Each report looks like an isolated incident. But pin those reports on a map, draw lines between shared addresses, phone numbers, and associates, and the ring becomes obvious. Graph-based fraud detection is the digital version of that pin-and-string board that every detective show features, except it processes millions of connections in seconds.
How KumoRFM solves this
Relational intelligence built for insurance data
Kumo connects claims, policies, claimants, providers, repair facilities, adjusters, and geographic data into a single relational graph. The model detects that Claim CLM-9201 involves a claimant who shares a phone number with two other recent claimants, all three used the same body shop, and the repair estimates follow an identical pattern. These multi-hop connections reveal fraud rings invisible to single-claim analysis. The graph also catches soft fraud: Claim CLM-9205 has inflated damage estimates based on the vehicle's age, repair-shop pricing patterns, and historical claim amounts for similar incidents.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
CLAIMS
| claim_id | policy_id | type | amount | loss_date | filed_date |
|---|---|---|---|---|---|
| CLM-9201 | POL-4401 | Auto Collision | $12,400 | 2025-09-01 | 2025-09-03 |
| CLM-9205 | POL-4418 | Auto Collision | $8,900 | 2025-09-05 | 2025-09-06 |
| CLM-9210 | POL-4425 | Property Fire | $45,000 | 2025-09-08 | 2025-09-10 |
CLAIMANTS
| claimant_id | name | phone | address | claims_12mo |
|---|---|---|---|---|
| CL-801 | Michael Torres | 555-0142 | 88 Pine St | 3 |
| CL-802 | Lisa Chen | 555-0142 | 220 Oak Ave | 2 |
| CL-803 | James Wilson | 555-0199 | 88 Pine St | 1 |
PROVIDERS
| provider_id | name | type | avg_estimate | claims_volume |
|---|---|---|---|---|
| PRV-101 | QuickFix Auto Body | Repair Shop | $11,800 | 42/mo |
| PRV-102 | City Auto Repair | Repair Shop | $7,200 | 28/mo |
| PRV-103 | Dr. Smith Chiro | Medical Provider | $4,500 | 65/mo |
CLAIM_NETWORK
| claim_id | claimant_id | provider_id | adjuster_id | shared_attributes |
|---|---|---|---|---|
| CLM-9201 | CL-801 | PRV-101 | ADJ-05 | Phone, Provider |
| CLM-9202 | CL-802 | PRV-101 | ADJ-05 | Phone, Provider |
| CLM-9203 | CL-803 | PRV-101 | ADJ-12 | Address, Provider |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(CLAIMS.FRAUD_CONFIRMED = 'True', 0, 0, days) FOR EACH CLAIMS.CLAIM_ID WHERE CLAIMS.AMOUNT > 5000
Prediction output
Every entity gets a score, updated continuously
| CLAIM_ID | AMOUNT | FRAUD_SCORE | RING_DETECTED | SIU_PRIORITY |
|---|---|---|---|---|
| CLM-9201 | $12,400 | 0.91 | Ring-A (3 claims) | Critical |
| CLM-9205 | $8,900 | 0.62 | None | High |
| CLM-9210 | $45,000 | 0.15 | None | Low |
Understand why
Every prediction includes feature attributions — no black boxes
Claim CLM-9201 (Auto Collision, $12,400)
Predicted: 91% fraud probability, Ring-A detected
Top contributing features
Shared phone with other claimants
2 matches
28% attribution
Common repair shop (high-volume)
PRV-101, 42/mo
24% attribution
Shared address pattern
88 Pine St
20% attribution
Claim timing cluster
3 in 10 days
17% attribution
Estimate vs vehicle value ratio
68% of ACV
11% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about claims fraud detection
How does AI detect insurance fraud rings?
AI detects fraud rings by building a graph of connections between claimants, providers, repair shops, and shared attributes (phone numbers, addresses, bank accounts). When multiple entities cluster together with unusual overlap, the model flags the entire cluster for investigation. This catches coordinated schemes that look normal at the individual claim level.
What is the false positive rate for AI fraud detection in insurance?
Rules-based systems typically generate 80-90% false positives. Traditional ML models bring that down to 50-60%. Graph-based approaches like Kumo reduce false positives to 35-45% because they use relationship context to distinguish genuinely connected claims from coincidental similarities.
How long does it take to detect insurance fraud with machine learning?
Traditional audit-based detection takes 12-24 months. Rules-based systems catch known patterns within days but miss new schemes. Graph-based ML models can detect emerging fraud rings within 2-4 weeks of the first coordinated claim, because the connection pattern becomes visible before enough individual claims accumulate to trigger traditional thresholds.
Can AI detect soft fraud in insurance claims?
Yes. Soft fraud (inflated damages, exaggerated injuries, added items to legitimate claims) accounts for the majority of fraud losses. AI models detect soft fraud by comparing claim details against historical patterns for the same vehicle type, repair shop, injury type, and geographic area. When an estimate is 2-3x the expected amount for the circumstances, the model flags it.
Bottom line: Reduce fraud losses by 30-50% and cut SIU false-positive rates by 40%, saving $40-80M annually for a top-20 P&C insurer.
Related use cases
Explore more insurance use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




