Insurance fraud costs the US economy $80 billion annually, according to the Coalition Against Insurance Fraud. That is $80 billion in fictitious claims, inflated injuries, staged accidents, and provider billing schemes that flow straight through to higher premiums for everyone else. The FBI estimates the average American family pays $400-700 per year in excess premiums to cover it.
Every major insurer runs fraud detection models. They flag suspicious claims based on features like claim amount, injury type, time since policy inception, and claimant history. These models catch some fraud. They miss the fraud that costs the most: organized rings where providers, claimants, attorneys, and sometimes adjusters coordinate across dozens of claims.
The difference between catching individual fraudulent claims and catching fraud rings is the difference between recovering thousands and recovering millions. And the only way to see a ring is to see the graph.
claims — sample auto insurance data
| claim_id | policy_id | claimant | injury_type | amount | provider | attorney |
|---|---|---|---|---|---|---|
| CL-501 | POL-220 | J. Martinez | Whiplash | $32,000 | Dr. A. Shah | Law Office Chen |
| CL-502 | POL-318 | R. Thompson | Whiplash | $28,500 | Dr. A. Shah | Law Office Chen |
| CL-503 | POL-445 | K. Williams | Soft tissue | $35,200 | Dr. B. Patel | Law Office Chen |
| CL-504 | POL-112 | M. Garcia | Whiplash | $31,800 | Dr. A. Shah | Law Office Chen |
| CL-505 | POL-667 | D. Brown | Back pain | $29,400 | Dr. B. Patel | Self |
Highlighted: four claims share the same attorney. Three share the same provider. All have soft-tissue injuries within a 60-day window. Individually plausible. As a network: a fraud ring.
claims_triage — AI-predicted outcomes at FNOL
| claim_id | predicted_settlement | complexity | fraud_prob | litigation_risk | recommended_action |
|---|---|---|---|---|---|
| CL-501 | $31,200 | High | 0.89 | 0.78 | Route to SIU |
| CL-502 | $27,800 | High | 0.86 | 0.72 | Route to SIU |
| CL-503 | $34,100 | High | 0.91 | 0.81 | Route to SIU |
| CL-505 | $18,200 | Low | 0.06 | 0.12 | Auto-adjudicate |
Graph-based triage identifies the ring members immediately and routes them to SIU, while auto-adjudicating the legitimate claim.
Why insurance data is naturally relational
An insurance company's data model typically spans 20-40 tables. Policies. Policyholders. Claims. Claimants (who may differ from policyholders). Injuries and diagnoses. Treatments. Providers (doctors, hospitals, repair shops). Adjusters. Agents. Attorneys. Payments. Reserves. Reinsurance contracts. Geographic risk zones.
Every claim connects to a web of entities. A single auto accident claim might involve: the policyholder, 2 claimants, 3 medical providers, 1 auto repair shop, 1 attorney, 1 adjuster, 8 treatment events, and 12 payment transactions. That is 28 entities across 8 tables, connected through foreign keys that define who-treated-whom, who-represents-whom, and who-paid-whom.
Traditional fraud models collapse this structure into a single row: "Claim #12345: amount $45,000, injury type whiplash, time since inception 89 days, claimant has 2 prior claims." The entire relational context is lost.
Fraud rings: the invisible threat
A fraud ring in auto insurance might work like this. A group of 20 people stage low-speed collisions in parking lots. Each files a separate claim with a different insurer. They all visit the same 3 medical providers, who bill for extensive treatment of soft-tissue injuries that cannot be verified by imaging. The same attorney represents all 20 claimants. The same tow company handles all the vehicles.
Each individual claim looks plausible. The amounts are within normal ranges. The injuries are consistent with the accidents. The providers are licensed. A flat model scoring each claim independently rates them as medium-risk at worst.
In the graph, the pattern is unmistakable. Twenty claimants connected to the same three providers, the same attorney, and the same tow company, with claims filed within a 60-day window. The hub-and-spoke topology screams organized fraud. But you can only see it if you look at the relationships.
Provider fraud in health insurance
Health insurance fraud follows similar relational patterns. A provider billing for services not rendered, upcoding procedures, or unbundling bundled services creates anomalies that are visible in the provider-patient-treatment-diagnosis graph.
A provider with 200 patients who all received the same expensive diagnostic test within 30 days is suspicious. A provider whose patients are referred exclusively by two other providers, and whose billing volume spiked 300% in 6 months, is more suspicious. A provider whose patient overlap with known fraudulent providers exceeds statistical norms is a strong lead.
The National Health Care Anti-Fraud Association estimates that health care fraud costs $68 billion annually in the US, representing 3-10% of total health care spending. Graph-based detection can identify suspicious provider networks 2-3x faster than rule-based systems while reducing false positive rates by 40-50%.
Underwriting: precision risk pricing
Underwriting has historically relied on actuarial tables: age, location, vehicle type, driving record, credit score. These features are predictive, but they treat each applicant as an independent data point. Graph-based underwriting adds the relational dimension.
applicants — identical actuarial profiles
| applicant | age | vehicle | zip_code | credit_score | driving_record |
|---|---|---|---|---|---|
| Applicant A | 35 | 2023 Honda Accord | 97201 | 720 | Clean (5 years) |
| Applicant B | 34 | 2023 Honda Accord | 97205 | 715 | Clean (6 years) |
Virtually identical profiles. Traditional underwriting assigns them the same risk tier and premium within $20/year.
geographic_risk_graph — what the relational model sees
| metric | Applicant A (zip 97201) | Applicant B (zip 97205) |
|---|---|---|
| Auto theft claims (5-year) | 2 claims | 47 claims |
| Collision claims (5-year) | 8 claims | 31 claims |
| Avg claim severity | $4,200 | $11,800 |
| Nearby repair shop fraud rate | 1% | 12% |
| Similar policyholder loss ratio | 52% | 89% |
Applicant B's zip code has 23x more auto thefts and nearly 4x the collision claims. Policyholders with similar profiles in that zip have an 89% loss ratio versus 52%. The expected loss differs by 3-5x despite identical flat features.
Beyond geography, relational underwriting considers: the claims history of similar policyholders (not just the applicant), the risk profiles of providers in the applicant's likely treatment network, correlation patterns between policy features and claims outcomes across the existing book, and economic indicators from connected entities (employers, industries, regions).
Traditional insurance AI
- Scores individual claims with flat features
- Underwriting uses actuarial tables and credit scores
- Fraud detection catches individual bad actors
- Claims triage based on simple rules
- 5-10% of claims cost lost to undetected fraud
Graph-based insurance AI
- Analyzes claim-provider-claimant-attorney network
- Underwriting uses relational risk signals
- Fraud detection catches organized rings
- Claims triage predicts complexity, fraud, and litigation
- 40-60% more fraud detected with fewer false positives
PQL Query
PREDICT fraud_ring_probability FOR EACH claims.claim_id WHERE claims.filed_date > '2025-01-01'
One query scores every new claim against the full claim-provider-claimant-attorney network. The model detects ring structures, shared entities, and coordinated filing patterns.
Output
| claim_id | fraud_ring_prob | ring_size | shared_entities | action |
|---|---|---|---|---|
| CL-501 | 0.89 | 4 | Provider + Attorney + timing | SIU investigation |
| CL-502 | 0.86 | 4 | Provider + Attorney + timing | SIU investigation |
| CL-503 | 0.91 | 4 | Attorney + timing + injury type | SIU investigation |
| CL-504 | 0.88 | 4 | Provider + Attorney + timing | SIU investigation |
| CL-505 | 0.06 | 0 | No network anomalies | Auto-adjudicate |
Claims management: speed and accuracy
The claims process is where insurers deliver on their promise. Speed matters: customers who wait more than 30 days for claims resolution are 2.5x more likely to switch carriers at renewal. Accuracy matters: underpayment leads to litigation, overpayment erodes margins.
Intelligent triage
When a claim arrives at first notice of loss (FNOL), graph-based models can predict: expected settlement amount (within 15-20% accuracy at FNOL), claims complexity (simple, moderate, complex), litigation likelihood, fraud probability, and subrogation potential. These predictions enable immediate routing: simple claims go to auto-adjudication or junior adjusters, complex claims go to senior adjusters, suspicious claims go to SIU.
Insurers implementing AI-driven triage report 30-50% reductions in average cycle time for straightforward claims and 15-25% reductions in total claims cost through better reserve accuracy and faster settlement.
Settlement prediction
Predicting the final settlement amount at FNOL determines reserve accuracy, which directly affects financial reporting and reinsurance costs. Traditional models use claim characteristics (injury type, vehicle damage, jurisdiction). Graph-based models add the outcomes of similar claims with the same providers, attorneys, and adjusters, providing 20-30% more accurate reserve estimates.
The foundation model approach
Building separate models for fraud detection, underwriting optimization, claims triage, settlement prediction, and retention requires five separate ML pipelines, five separate feature engineering efforts, and five separate maintenance budgets. At a mid-size insurer, this represents $3M-8M annually in ML team and infrastructure costs.
KumoRFM connects directly to the insurer's data warehouse, understands the full policy-claim-provider-claimant schema, and answers any prediction question without task-specific engineering. The same model that detects fraud also predicts settlement amounts, scores underwriting risk, and identifies lapse-prone policyholders.
On the RelBench benchmark, KumoRFM zero-shot achieves 76.71 AUROC across classification tasks, outperforming supervised GNNs trained specifically on each task. Fine-tuning pushes accuracy to 81.14 AUROC.
For insurers, where the combined ratio determines profitability and every point of loss ratio improvement flows directly to the bottom line, the ability to predict better and faster across all lines of business simultaneously is not a technology upgrade. It is a competitive advantage that compounds every quarter.