Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

AI in Insurance: Claims, Underwriting, and Fraud Detection

Insurance data is deeply relational. Policies, claims, adjusters, providers, treatments, agents, claimants. The companies that predict from this full graph structure are reducing claims costs 15-25% while catching fraud that flat models miss.

TL;DR

  • 1Insurance fraud costs $80 billion annually in the US. Organized fraud rings account for $5-8 billion in auto insurance alone and operate for years because individual claims pass traditional thresholds.
  • 2Graph-based detection identifies ring structures (shared providers, attorneys, tow companies, timing clusters) in days rather than the 12-18 months typical of manual SIU investigation.
  • 3Claims triage powered by graph AI predicts settlement amount, complexity, fraud probability, and litigation risk at FNOL, reducing cycle times by 30-50% and total claims cost by 15-25%.
  • 4Underwriting models incorporating relational risk signals (geographic claim density, provider network quality, policyholder similarity) improve loss ratios by 2-5 points.
  • 5One foundation model serves fraud detection, claims triage, settlement prediction, underwriting, and retention from a single platform. For insurers, combined ratio improvement flows directly to the bottom line.

Insurance fraud costs the US economy $80 billion annually, according to the Coalition Against Insurance Fraud. That is $80 billion in fictitious claims, inflated injuries, staged accidents, and provider billing schemes that flow straight through to higher premiums for everyone else. The FBI estimates the average American family pays $400-700 per year in excess premiums to cover it.

Every major insurer runs fraud detection models. They flag suspicious claims based on features like claim amount, injury type, time since policy inception, and claimant history. These models catch some fraud. They miss the fraud that costs the most: organized rings where providers, claimants, attorneys, and sometimes adjusters coordinate across dozens of claims.

The difference between catching individual fraudulent claims and catching fraud rings is the difference between recovering thousands and recovering millions. And the only way to see a ring is to see the graph.

claims — sample auto insurance data

claim_idpolicy_idclaimantinjury_typeamountproviderattorney
CL-501POL-220J. MartinezWhiplash$32,000Dr. A. ShahLaw Office Chen
CL-502POL-318R. ThompsonWhiplash$28,500Dr. A. ShahLaw Office Chen
CL-503POL-445K. WilliamsSoft tissue$35,200Dr. B. PatelLaw Office Chen
CL-504POL-112M. GarciaWhiplash$31,800Dr. A. ShahLaw Office Chen
CL-505POL-667D. BrownBack pain$29,400Dr. B. PatelSelf

Highlighted: four claims share the same attorney. Three share the same provider. All have soft-tissue injuries within a 60-day window. Individually plausible. As a network: a fraud ring.

claims_triage — AI-predicted outcomes at FNOL

claim_idpredicted_settlementcomplexityfraud_problitigation_riskrecommended_action
CL-501$31,200High0.890.78Route to SIU
CL-502$27,800High0.860.72Route to SIU
CL-503$34,100High0.910.81Route to SIU
CL-505$18,200Low0.060.12Auto-adjudicate

Graph-based triage identifies the ring members immediately and routes them to SIU, while auto-adjudicating the legitimate claim.

Why insurance data is naturally relational

An insurance company's data model typically spans 20-40 tables. Policies. Policyholders. Claims. Claimants (who may differ from policyholders). Injuries and diagnoses. Treatments. Providers (doctors, hospitals, repair shops). Adjusters. Agents. Attorneys. Payments. Reserves. Reinsurance contracts. Geographic risk zones.

Every claim connects to a web of entities. A single auto accident claim might involve: the policyholder, 2 claimants, 3 medical providers, 1 auto repair shop, 1 attorney, 1 adjuster, 8 treatment events, and 12 payment transactions. That is 28 entities across 8 tables, connected through foreign keys that define who-treated-whom, who-represents-whom, and who-paid-whom.

Traditional fraud models collapse this structure into a single row: "Claim #12345: amount $45,000, injury type whiplash, time since inception 89 days, claimant has 2 prior claims." The entire relational context is lost.

Fraud rings: the invisible threat

A fraud ring in auto insurance might work like this. A group of 20 people stage low-speed collisions in parking lots. Each files a separate claim with a different insurer. They all visit the same 3 medical providers, who bill for extensive treatment of soft-tissue injuries that cannot be verified by imaging. The same attorney represents all 20 claimants. The same tow company handles all the vehicles.

Each individual claim looks plausible. The amounts are within normal ranges. The injuries are consistent with the accidents. The providers are licensed. A flat model scoring each claim independently rates them as medium-risk at worst.

In the graph, the pattern is unmistakable. Twenty claimants connected to the same three providers, the same attorney, and the same tow company, with claims filed within a 60-day window. The hub-and-spoke topology screams organized fraud. But you can only see it if you look at the relationships.

Provider fraud in health insurance

Health insurance fraud follows similar relational patterns. A provider billing for services not rendered, upcoding procedures, or unbundling bundled services creates anomalies that are visible in the provider-patient-treatment-diagnosis graph.

A provider with 200 patients who all received the same expensive diagnostic test within 30 days is suspicious. A provider whose patients are referred exclusively by two other providers, and whose billing volume spiked 300% in 6 months, is more suspicious. A provider whose patient overlap with known fraudulent providers exceeds statistical norms is a strong lead.

The National Health Care Anti-Fraud Association estimates that health care fraud costs $68 billion annually in the US, representing 3-10% of total health care spending. Graph-based detection can identify suspicious provider networks 2-3x faster than rule-based systems while reducing false positive rates by 40-50%.

Underwriting: precision risk pricing

Underwriting has historically relied on actuarial tables: age, location, vehicle type, driving record, credit score. These features are predictive, but they treat each applicant as an independent data point. Graph-based underwriting adds the relational dimension.

applicants — identical actuarial profiles

applicantagevehiclezip_codecredit_scoredriving_record
Applicant A352023 Honda Accord97201720Clean (5 years)
Applicant B342023 Honda Accord97205715Clean (6 years)

Virtually identical profiles. Traditional underwriting assigns them the same risk tier and premium within $20/year.

geographic_risk_graph — what the relational model sees

metricApplicant A (zip 97201)Applicant B (zip 97205)
Auto theft claims (5-year)2 claims47 claims
Collision claims (5-year)8 claims31 claims
Avg claim severity$4,200$11,800
Nearby repair shop fraud rate1%12%
Similar policyholder loss ratio52%89%

Applicant B's zip code has 23x more auto thefts and nearly 4x the collision claims. Policyholders with similar profiles in that zip have an 89% loss ratio versus 52%. The expected loss differs by 3-5x despite identical flat features.

Beyond geography, relational underwriting considers: the claims history of similar policyholders (not just the applicant), the risk profiles of providers in the applicant's likely treatment network, correlation patterns between policy features and claims outcomes across the existing book, and economic indicators from connected entities (employers, industries, regions).

Traditional insurance AI

  • Scores individual claims with flat features
  • Underwriting uses actuarial tables and credit scores
  • Fraud detection catches individual bad actors
  • Claims triage based on simple rules
  • 5-10% of claims cost lost to undetected fraud

Graph-based insurance AI

  • Analyzes claim-provider-claimant-attorney network
  • Underwriting uses relational risk signals
  • Fraud detection catches organized rings
  • Claims triage predicts complexity, fraud, and litigation
  • 40-60% more fraud detected with fewer false positives

PQL Query

PREDICT fraud_ring_probability
FOR EACH claims.claim_id
WHERE claims.filed_date > '2025-01-01'

One query scores every new claim against the full claim-provider-claimant-attorney network. The model detects ring structures, shared entities, and coordinated filing patterns.

Output

claim_idfraud_ring_probring_sizeshared_entitiesaction
CL-5010.894Provider + Attorney + timingSIU investigation
CL-5020.864Provider + Attorney + timingSIU investigation
CL-5030.914Attorney + timing + injury typeSIU investigation
CL-5040.884Provider + Attorney + timingSIU investigation
CL-5050.060No network anomaliesAuto-adjudicate

Claims management: speed and accuracy

The claims process is where insurers deliver on their promise. Speed matters: customers who wait more than 30 days for claims resolution are 2.5x more likely to switch carriers at renewal. Accuracy matters: underpayment leads to litigation, overpayment erodes margins.

Intelligent triage

When a claim arrives at first notice of loss (FNOL), graph-based models can predict: expected settlement amount (within 15-20% accuracy at FNOL), claims complexity (simple, moderate, complex), litigation likelihood, fraud probability, and subrogation potential. These predictions enable immediate routing: simple claims go to auto-adjudication or junior adjusters, complex claims go to senior adjusters, suspicious claims go to SIU.

Insurers implementing AI-driven triage report 30-50% reductions in average cycle time for straightforward claims and 15-25% reductions in total claims cost through better reserve accuracy and faster settlement.

Settlement prediction

Predicting the final settlement amount at FNOL determines reserve accuracy, which directly affects financial reporting and reinsurance costs. Traditional models use claim characteristics (injury type, vehicle damage, jurisdiction). Graph-based models add the outcomes of similar claims with the same providers, attorneys, and adjusters, providing 20-30% more accurate reserve estimates.

The foundation model approach

Building separate models for fraud detection, underwriting optimization, claims triage, settlement prediction, and retention requires five separate ML pipelines, five separate feature engineering efforts, and five separate maintenance budgets. At a mid-size insurer, this represents $3M-8M annually in ML team and infrastructure costs.

KumoRFM connects directly to the insurer's data warehouse, understands the full policy-claim-provider-claimant schema, and answers any prediction question without task-specific engineering. The same model that detects fraud also predicts settlement amounts, scores underwriting risk, and identifies lapse-prone policyholders.

On the RelBench benchmark, KumoRFM zero-shot achieves 76.71 AUROC across classification tasks, outperforming supervised GNNs trained specifically on each task. Fine-tuning pushes accuracy to 81.14 AUROC.

For insurers, where the combined ratio determines profitability and every point of loss ratio improvement flows directly to the bottom line, the ability to predict better and faster across all lines of business simultaneously is not a technology upgrade. It is a competitive advantage that compounds every quarter.

Frequently asked questions

How is AI used in insurance?

AI is used across the insurance value chain: underwriting (risk assessment and pricing), claims management (triage, settlement prediction, subrogation identification), fraud detection ($80B+ annual problem in the US alone), customer retention (lapse prediction and intervention), and distribution (lead scoring and cross-sell). The highest-ROI applications involve prediction tasks on relational data: policies, claims, claimants, providers, adjusters, and agents form a rich graph that carries predictive signals flat models miss.

How does graph-based AI improve insurance fraud detection?

Insurance fraud rings involve coordinated actors: claimants, providers, attorneys, and adjusters who work together across multiple claims. Flat models scoring individual claims miss these coordinated patterns. Graph-based AI analyzes the full claim-provider-claimant-attorney network, detecting clusters of connected entities with unusual patterns: the same provider treating claimants who all use the same attorney, claimants filing similar injuries at the same time through different agents, or providers billing for treatments that overlap in ways consistent with staged accidents.

What is the cost of insurance fraud in the United States?

The Coalition Against Insurance Fraud estimates that insurance fraud costs $80 billion annually in the US across all lines (property, casualty, health, auto, workers' comp). The FBI estimates that the average US family pays $400-700 per year in increased premiums due to fraud. For insurers, fraud accounts for 5-10% of total claims cost, making it the single largest source of unnecessary loss.

How can AI improve insurance underwriting?

Traditional underwriting uses application data, credit scores, and actuarial tables. Graph-based AI adds relational signals: the risk profiles of similar policyholders, claims patterns of providers in the applicant's network, geographic clustering of loss events, and correlation patterns between policy features and claims outcomes. This enables more precise risk segmentation, reducing adverse selection by 10-20% and improving loss ratios by 2-5 points.

What is claims triage and how does AI help?

Claims triage is the process of classifying incoming claims by complexity and routing them to appropriate handlers. Simple claims can be auto-adjudicated. Complex claims need senior adjusters. Potentially fraudulent claims need SIU review. AI models predict claim complexity, expected settlement amount, litigation likelihood, and fraud probability at the time of first notice of loss, enabling immediate routing that reduces cycle times by 30-50% and improves customer satisfaction for straightforward claims.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.