Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
1Classification · Fraud Detection

Claims Fraud Detection

Is this claim fraudulent?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Is this claim fraudulent?

Insurance fraud costs the US industry $80B+ annually (FBI), with 10% of all P&C claims containing some element of fraud (Coalition Against Insurance Fraud). Special Investigation Units (SIUs) can only investigate 5-10% of flagged claims, and legacy rules-based systems generate 80-90% false positives, burying real fraud in noise. Organized fraud rings are particularly hard to detect because they coordinate across multiple policies, claimants, providers, and repair shops. A single fraud ring can cost an insurer $5-20M before detection.

Quick answer

Graph-based AI detects insurance claims fraud by mapping hidden connections between claimants, providers, repair shops, and adjusters. Unlike rules-based systems that flag individual claims in isolation, relational models spot coordinated fraud rings and soft fraud patterns by analyzing multi-hop relationships across the entire claims network. Top implementations reduce fraud losses by 30-50% while cutting false positives by 40%.

Approaches compared

4 ways to solve this problem

1. Rules-Based Flagging

Hardcoded business rules that flag claims matching known fraud patterns (e.g., claim filed within 30 days of policy inception, amount exceeds threshold). Easy to implement and explain, but rigid and predictable.

Best for

Known, well-defined fraud patterns where speed matters more than coverage.

Watch out for

Fraudsters learn the rules fast. False-positive rates hit 80-90%, burying real fraud in noise. New fraud schemes go undetected until someone writes a new rule.

2. Traditional ML (XGBoost / Logistic Regression)

Gradient-boosted trees or logistic regression trained on claim-level features like amount, timing, claimant history, and provider stats. Better than rules because it learns from data, but still operates on flat, single-claim feature vectors.

Best for

Soft fraud on individual claims where the signal is in the claim itself (inflated estimates, suspicious timing).

Watch out for

Cannot see multi-hop connections. A fraud ring where three claimants share a phone number and a body shop requires manual feature engineering that rarely happens.

3. Social Network Analysis (SNA)

Graph analytics that map connections between entities and compute centrality, clustering, and community-detection metrics. Good at visualizing suspicious networks for SIU investigators.

Best for

Exploratory fraud ring investigations where analysts need visual maps of connected entities.

Watch out for

SNA produces graph features but does not predict fraud directly. You still need a separate model, and the handoff between graph analysis and prediction introduces gaps.

4. Graph Neural Networks (Kumo's Approach)

End-to-end relational learning that builds a heterogeneous graph across claims, claimants, providers, and shared attributes, then learns fraud patterns directly from the graph structure. No manual feature engineering required.

Best for

Detecting both organized fraud rings and soft fraud at scale. The model learns multi-hop patterns (shared phone + shared provider + timing cluster) automatically.

Watch out for

Requires relational data to be connected. If your claims data lives in completely siloed systems with no joinable keys, you need data integration first.

Key metric: Insurance fraud costs the US industry $80B+ annually (FBI). Graph-based detection reduces false-positive investigation rates by 40% while catching 30-50% more confirmed fraud.

Why relational data changes the answer

Flat-table fraud models treat each claim as an independent row. They can see that Claim CLM-9201 has a high amount and was filed shortly after policy inception, but they cannot see that the claimant shares a phone number with two other recent claimants, all three used the same body shop, and the repair estimates follow an identical dollar pattern. These multi-hop connections are the signature of organized fraud rings, and they are invisible to any model that operates on a single-row feature vector. Manual feature engineering can capture some of these signals (e.g., 'count of claims sharing this phone number'), but it requires a data scientist to anticipate every possible connection pattern in advance.

Relational learning solves this by operating directly on the connected graph. The model walks from a claim to its claimant, to other claims by that claimant, to the providers on those claims, to other claimants using those providers, and learns which connection patterns predict fraud. This is how real investigators think: they pull threads and follow connections. Graph neural networks automate that process at scale, examining millions of connection patterns simultaneously. The result is fraud detection that catches rings months earlier, with 40% fewer false positives, because the model has seen the full picture instead of a flattened summary.

Detecting fraud from a flat claims table is like trying to find a crime ring by reading individual police reports in alphabetical order. Each report looks like an isolated incident. But pin those reports on a map, draw lines between shared addresses, phone numbers, and associates, and the ring becomes obvious. Graph-based fraud detection is the digital version of that pin-and-string board that every detective show features, except it processes millions of connections in seconds.

How KumoRFM solves this

Relational intelligence built for insurance data

Kumo connects claims, policies, claimants, providers, repair facilities, adjusters, and geographic data into a single relational graph. The model detects that Claim CLM-9201 involves a claimant who shares a phone number with two other recent claimants, all three used the same body shop, and the repair estimates follow an identical pattern. These multi-hop connections reveal fraud rings invisible to single-claim analysis. The graph also catches soft fraud: Claim CLM-9205 has inflated damage estimates based on the vehicle's age, repair-shop pricing patterns, and historical claim amounts for similar incidents.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

CLAIMS

claim_idpolicy_idtypeamountloss_datefiled_date
CLM-9201POL-4401Auto Collision$12,4002025-09-012025-09-03
CLM-9205POL-4418Auto Collision$8,9002025-09-052025-09-06
CLM-9210POL-4425Property Fire$45,0002025-09-082025-09-10

CLAIMANTS

claimant_idnamephoneaddressclaims_12mo
CL-801Michael Torres555-014288 Pine St3
CL-802Lisa Chen555-0142220 Oak Ave2
CL-803James Wilson555-019988 Pine St1

PROVIDERS

provider_idnametypeavg_estimateclaims_volume
PRV-101QuickFix Auto BodyRepair Shop$11,80042/mo
PRV-102City Auto RepairRepair Shop$7,20028/mo
PRV-103Dr. Smith ChiroMedical Provider$4,50065/mo

CLAIM_NETWORK

claim_idclaimant_idprovider_idadjuster_idshared_attributes
CLM-9201CL-801PRV-101ADJ-05Phone, Provider
CLM-9202CL-802PRV-101ADJ-05Phone, Provider
CLM-9203CL-803PRV-101ADJ-12Address, Provider
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT BOOL(CLAIMS.FRAUD_CONFIRMED = 'True', 0, 0, days)
FOR EACH CLAIMS.CLAIM_ID
WHERE CLAIMS.AMOUNT > 5000
3

Prediction output

Every entity gets a score, updated continuously

CLAIM_IDAMOUNTFRAUD_SCORERING_DETECTEDSIU_PRIORITY
CLM-9201$12,4000.91Ring-A (3 claims)Critical
CLM-9205$8,9000.62NoneHigh
CLM-9210$45,0000.15NoneLow
4

Understand why

Every prediction includes feature attributions — no black boxes

Claim CLM-9201 (Auto Collision, $12,400)

Predicted: 91% fraud probability, Ring-A detected

Top contributing features

Shared phone with other claimants

2 matches

28% attribution

Common repair shop (high-volume)

PRV-101, 42/mo

24% attribution

Shared address pattern

88 Pine St

20% attribution

Claim timing cluster

3 in 10 days

17% attribution

Estimate vs vehicle value ratio

68% of ACV

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about claims fraud detection

How does AI detect insurance fraud rings?

AI detects fraud rings by building a graph of connections between claimants, providers, repair shops, and shared attributes (phone numbers, addresses, bank accounts). When multiple entities cluster together with unusual overlap, the model flags the entire cluster for investigation. This catches coordinated schemes that look normal at the individual claim level.

What is the false positive rate for AI fraud detection in insurance?

Rules-based systems typically generate 80-90% false positives. Traditional ML models bring that down to 50-60%. Graph-based approaches like Kumo reduce false positives to 35-45% because they use relationship context to distinguish genuinely connected claims from coincidental similarities.

How long does it take to detect insurance fraud with machine learning?

Traditional audit-based detection takes 12-24 months. Rules-based systems catch known patterns within days but miss new schemes. Graph-based ML models can detect emerging fraud rings within 2-4 weeks of the first coordinated claim, because the connection pattern becomes visible before enough individual claims accumulate to trigger traditional thresholds.

Can AI detect soft fraud in insurance claims?

Yes. Soft fraud (inflated damages, exaggerated injuries, added items to legitimate claims) accounts for the majority of fraud losses. AI models detect soft fraud by comparing claim details against historical patterns for the same vehicle type, repair shop, injury type, and geographic area. When an estimate is 2-3x the expected amount for the circumstances, the model flags it.

Bottom line: Reduce fraud losses by 30-50% and cut SIU false-positive rates by 40%, saving $40-80M annually for a top-20 P&C insurer.

Topics covered

insurance claims fraud detectionclaims fraud AIinsurance fraud analyticsSIU optimizationgraph neural network insurance fraudKumoRFMrelational deep learning insurancefraudulent claims predictioninsurance fraud ring detectionclaims investigation AI

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.