Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
7Classification · Provider Fraud

Provider Fraud Detection

Which healthcare providers are submitting suspicious claims?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Which healthcare providers are submitting suspicious claims?

Healthcare fraud accounts for 3-10% of total health spending, costing $68-230B annually in the US (National Health Care Anti-Fraud Association). Provider-driven fraud (upcoding, unbundling, phantom billing, unnecessary procedures) is the largest category, yet most schemes are detected only through retrospective audits 12-24 months after the billing occurs. By then, the insurer has already paid out millions. A single fraudulent medical practice can bill $5-15M before detection. SIU teams can only audit 1-2% of providers annually, so targeting accuracy is critical.

Quick answer

AI detects provider fraud by mapping billing patterns, referral networks, procedure-code distributions, and patient overlaps into a relational graph. Unlike retrospective audits that take 12-24 months to catch schemes, graph-based models identify suspicious providers within weeks by detecting anomalous patterns: concentrated referral networks, upcoding signatures, and billing rates far above specialty peers. A single fraudulent practice can bill $5-15M before traditional detection catches up.

Approaches compared

4 ways to solve this problem

1. Retrospective Claims Audits

SIU teams review provider billing patterns after the fact, typically 12-24 months after claims are paid. Triggered by complaints, tips, or periodic sampling. The traditional approach across the industry.

Best for

Building legal cases for recovery when fraud is already suspected. Audits produce evidence admissible in court.

Watch out for

By the time the audit happens, the insurer has already paid out $5-15M to the fraudulent provider. Recovery rates on paid claims are low. SIU teams can only audit 1-2% of providers annually.

2. Statistical Outlier Detection

Flag providers whose billing metrics (average claim amount, procedure frequency, claims per patient) exceed peer benchmarks by a statistical threshold. Automated, scalable, and easy to explain.

Best for

Catching egregious outliers: providers billing 3x the peer average are hard to miss with simple z-score analysis.

Watch out for

Smart fraudsters stay just below detection thresholds. Outlier detection also generates high false positives because legitimate practice variation (e.g., a surgeon who handles complex cases) looks like anomalous billing.

3. Rules-Based Upcoding Detection

Rules that flag specific billing patterns: high rates of E&M code 99215 (high complexity) vs. 99214 (moderate), unbundled procedure codes that should be billed together, or duplicate billing within short windows.

Best for

Known upcoding schemes where the billing pattern is well-documented and the rules are clear.

Watch out for

Rules catch yesterday's fraud. Providers shift to new coding patterns (e.g., unbundling instead of upcoding) once rules are published. Each new scheme requires a new rule, creating a perpetual cat-and-mouse game.

4. Graph Neural Networks (Kumo's Approach)

Connects providers to their billing patterns, referral networks, patient overlaps, procedure distributions, and peer benchmarks in a relational graph. Detects coordinated fraud patterns across the provider-referrer-patient network.

Best for

Detecting referral-kickback schemes, patient-sharing fraud rings, and sophisticated upcoding patterns where the signal is in the network structure, not individual billing metrics.

Watch out for

Requires referral and patient-provider relationship data to be connected. If claims data does not capture referring-provider information, the network analysis is limited to billing patterns alone.

Key metric: Healthcare fraud costs $68-230B annually (NHCAA). Graph-based detection identifies fraudulent providers 6-12 months earlier, recovering $100-300M in overpayments for a top-10 health insurer.

Why relational data changes the answer

Flat provider-fraud models analyze each provider's billing in isolation: average claim amount, procedure mix, claims volume versus peers. They can flag a provider billing 3x the peer average. But they cannot see that the provider receives 92% of referrals from a single referring physician, shares 88% of patients with that referrer, bills high-complexity codes at 3x the peer rate specifically on referred patients, and the referrer's billing pattern changed dramatically after the relationship started. This referral-kickback pattern is one of the most expensive fraud schemes in healthcare, and it is invisible to any model that does not examine the provider network.

Relational learning maps the full provider ecosystem. The model walks from the target provider to their referral sources, to the patients they share, to the billing patterns on shared versus non-shared patients, to the referrer's own billing changes over time. It learns that concentrated referral relationships with high patient overlap and correlated billing anomalies are the signature of kickback schemes. It also detects that a provider's upcoding pattern is selective: they bill at high-complexity rates only for patients referred by the suspected partner, while billing normally for walk-in patients. This selectivity is a strong fraud signal that flat models miss entirely because they average across all patients.

Detecting provider fraud from billing summaries alone is like auditing a business by looking only at their income statement. The numbers might look unusual, but you cannot tell if they are running a legitimate high-volume practice or laundering referral kickbacks. Graph-based detection is the equivalent of a forensic accountant who traces every dollar through the network of relationships: who referred who, who treated who, and how the billing changed after those relationships formed.

How KumoRFM solves this

Relational intelligence built for insurance data

Kumo connects providers, claims, patients, referral networks, procedure codes, and billing patterns into a relational graph. The model detects that Provider PRV-501 bills 3x the average number of high-complexity procedures, shares patients with a referring provider at an unusually high rate (92% of referrals come from one source), and has a billing-code distribution that deviates significantly from peer providers in the same specialty and region. These graph-based signals surface suspicious providers 6-12 months earlier than traditional audit triggers.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

PROVIDERS

provider_idnamespecialtyregionyears_in_network
PRV-501MedPro Spine ClinicOrthopedicsSoutheast4.2
PRV-502City General RadiologyRadiologyNortheast12.8
PRV-503Sunrise Physical TherapyPT/RehabWest7.5

BILLING_PATTERNS

provider_idavg_claim_amounthigh_complexity_rateclaims_per_patientvs_peer_avg
PRV-501$4,80078%8.43.2x peer avg
PRV-502$1,20032%3.11.1x peer avg
PRV-503$85015%12.21.8x peer avg

REFERRAL_NETWORK

provider_idtop_referrerreferral_concentrationpatient_overlap_pct
PRV-501Dr. R. Martinez92%88%
PRV-502Multiple (15+)12%8%
PRV-503Dr. K. Patel65%52%

PROCEDURE_ANALYSIS

provider_idtop_codefrequencypeer_frequencyupcoding_signal
PRV-50199214 (Moderate)12%45%Low usage (possible upcoding to 99215)
PRV-50199215 (High)68%22%3.1x above peer norm
PRV-50397110 (Therapeutic)45%38%1.2x above peer norm
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT BOOL(PROVIDERS.FRAUD_CONFIRMED = 'True', 0, 12, months)
FOR EACH PROVIDERS.PROVIDER_ID
WHERE BILLING_PATTERNS.VS_PEER_AVG > 1.5
3

Prediction output

Every entity gets a score, updated continuously

PROVIDER_IDSPECIALTYFRAUD_SCOREEST_OVERPAYMENTSIU_PRIORITY
PRV-501Orthopedics0.89$2.4M/yrCritical
PRV-503PT/Rehab0.52$420K/yrHigh
PRV-502Radiology0.08$0Low
4

Understand why

Every prediction includes feature attributions — no black boxes

Provider PRV-501 (MedPro Spine Clinic)

Predicted: 89% fraud probability, est. $2.4M/yr overpayment

Top contributing features

High-complexity code rate (99215)

68% vs 22% peer

28% attribution

Referral concentration from single source

92%

25% attribution

Claims per patient far above peer

8.4 vs 2.6

21% attribution

Patient overlap with referring provider

88%

15% attribution

Average claim amount anomaly

$4,800 vs $1,500

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about provider fraud detection

How does AI detect healthcare provider fraud?

AI detects provider fraud by analyzing the relational network: billing patterns, referral concentrations, patient overlaps, procedure-code distributions, and peer comparisons. Graph-based models identify suspicious patterns like concentrated referral networks, selective upcoding on referred patients, and billing anomalies that correlate with specific referral relationships. These network signals surface fraud 6-12 months earlier than retrospective audits.

What are the most common types of healthcare provider fraud?

The most costly schemes are upcoding (billing high-complexity codes for simple visits), referral kickbacks (paying for patient referrals in exchange for inflated billing), phantom billing (billing for services never rendered), and unbundling (billing separately for procedures that should be billed as a package). Upcoding and referral kickbacks are the hardest to detect because they look like legitimate high-acuity practice to simple outlier detection.

How much does healthcare provider fraud cost insurers?

Healthcare fraud costs $68-230B annually in the US (National Health Care Anti-Fraud Association), with provider-driven schemes accounting for the largest share. A single fraudulent practice can bill $5-15M before detection using traditional methods. The 12-24 month detection lag means insurers pay out billions in fraudulent claims before catching the pattern.

How can insurers prioritize which providers to audit?

Graph-based fraud scores prioritize providers by combining billing anomaly signals with network risk indicators (referral concentration, patient overlap, peer deviation). This produces a ranked list where the top-scored providers are 5-10x more likely to be confirmed fraudulent than randomly selected audit targets. SIU teams can focus their limited audit capacity on the highest-impact cases.

Bottom line: Detect fraudulent providers 6-12 months earlier and recover $100-300M in annual overpayments for a top-10 health insurer while focusing SIU resources on the highest-impact investigations.

Topics covered

provider fraud detection AIhealthcare fraud analyticsmedical billing fraud predictioninsurance provider audit AIgraph neural network provider fraudKumoRFMrelational deep learning insuranceupcoding detectionphantom billing detectionhealthcare claims analytics

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.