7Classification · Provider Fraud

Provider Fraud Detection

“Which healthcare providers are submitting suspicious claims?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Which healthcare providers are submitting suspicious claims?

Healthcare fraud accounts for 3-10% of total health spending, costing $68-230B annually in the US (National Health Care Anti-Fraud Association). Provider-driven fraud (upcoding, unbundling, phantom billing, unnecessary procedures) is the largest category, yet most schemes are detected only through retrospective audits 12-24 months after the billing occurs. By then, the insurer has already paid out millions. A single fraudulent medical practice can bill $5-15M before detection. SIU teams can only audit 1-2% of providers annually, so targeting accuracy is critical.

Quick answer

AI detects provider fraud by mapping billing patterns, referral networks, procedure-code distributions, and patient overlaps into a relational graph. Unlike retrospective audits that take 12-24 months to catch schemes, graph-based models identify suspicious providers within weeks by detecting anomalous patterns: concentrated referral networks, upcoding signatures, and billing rates far above specialty peers. A single fraudulent practice can bill $5-15M before traditional detection catches up.

Approaches compared

4 ways to solve this problem

1. Retrospective Claims Audits

SIU teams review provider billing patterns after the fact, typically 12-24 months after claims are paid. Triggered by complaints, tips, or periodic sampling. The traditional approach across the industry.

Best for

Building legal cases for recovery when fraud is already suspected. Audits produce evidence admissible in court.

Watch out for

By the time the audit happens, the insurer has already paid out $5-15M to the fraudulent provider. Recovery rates on paid claims are low. SIU teams can only audit 1-2% of providers annually.

2. Statistical Outlier Detection

Flag providers whose billing metrics (average claim amount, procedure frequency, claims per patient) exceed peer benchmarks by a statistical threshold. Automated, scalable, and easy to explain.

Best for

Catching egregious outliers: providers billing 3x the peer average are hard to miss with simple z-score analysis.

Watch out for

Smart fraudsters stay just below detection thresholds. Outlier detection also generates high false positives because legitimate practice variation (e.g., a surgeon who handles complex cases) looks like anomalous billing.

3. Rules-Based Upcoding Detection

Rules that flag specific billing patterns: high rates of E&M code 99215 (high complexity) vs. 99214 (moderate), unbundled procedure codes that should be billed together, or duplicate billing within short windows.

Best for

Known upcoding schemes where the billing pattern is well-documented and the rules are clear.

Watch out for

Rules catch yesterday's fraud. Providers shift to new coding patterns (e.g., unbundling instead of upcoding) once rules are published. Each new scheme requires a new rule, creating a perpetual cat-and-mouse game.

4. Graph Neural Networks (Kumo's Approach)

Connects providers to their billing patterns, referral networks, patient overlaps, procedure distributions, and peer benchmarks in a relational graph. Detects coordinated fraud patterns across the provider-referrer-patient network.

Best for

Detecting referral-kickback schemes, patient-sharing fraud rings, and sophisticated upcoding patterns where the signal is in the network structure, not individual billing metrics.

Watch out for

Requires referral and patient-provider relationship data to be connected. If claims data does not capture referring-provider information, the network analysis is limited to billing patterns alone.

Key metric: Healthcare fraud costs $68-230B annually (NHCAA). Graph-based detection identifies fraudulent providers 6-12 months earlier, recovering $100-300M in overpayments for a top-10 health insurer.

Why relational data changes the answer

Flat provider-fraud models analyze each provider's billing in isolation: average claim amount, procedure mix, claims volume versus peers. They can flag a provider billing 3x the peer average. But they cannot see that the provider receives 92% of referrals from a single referring physician, shares 88% of patients with that referrer, bills high-complexity codes at 3x the peer rate specifically on referred patients, and the referrer's billing pattern changed dramatically after the relationship started. This referral-kickback pattern is one of the most expensive fraud schemes in healthcare, and it is invisible to any model that does not examine the provider network.

Relational learning maps the full provider ecosystem. The model walks from the target provider to their referral sources, to the patients they share, to the billing patterns on shared versus non-shared patients, to the referrer's own billing changes over time. It learns that concentrated referral relationships with high patient overlap and correlated billing anomalies are the signature of kickback schemes. It also detects that a provider's upcoding pattern is selective: they bill at high-complexity rates only for patients referred by the suspected partner, while billing normally for walk-in patients. This selectivity is a strong fraud signal that flat models miss entirely because they average across all patients.

Detecting provider fraud from billing summaries alone is like auditing a business by looking only at their income statement. The numbers might look unusual, but you cannot tell if they are running a legitimate high-volume practice or laundering referral kickbacks. Graph-based detection is the equivalent of a forensic accountant who traces every dollar through the network of relationships: who referred who, who treated who, and how the billing changed after those relationships formed.

How KumoRFM solves this

Relational intelligence built for insurance data

Kumo connects providers, claims, patients, referral networks, procedure codes, and billing patterns into a relational graph. The model detects that Provider PRV-501 bills 3x the average number of high-complexity procedures, shares patients with a referring provider at an unusually high rate (92% of referrals come from one source), and has a billing-code distribution that deviates significantly from peer providers in the same specialty and region. These graph-based signals surface suspicious providers 6-12 months earlier than traditional audit triggers.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

PROVIDERS

provider_id	name	specialty	region	years_in_network
PRV-501	MedPro Spine Clinic	Orthopedics	Southeast	4.2
PRV-502	City General Radiology	Radiology	Northeast	12.8
PRV-503	Sunrise Physical Therapy	PT/Rehab	West	7.5

BILLING_PATTERNS

provider_id	avg_claim_amount	high_complexity_rate	claims_per_patient	vs_peer_avg
PRV-501	$4,800	78%	8.4	3.2x peer avg
PRV-502	$1,200	32%	3.1	1.1x peer avg
PRV-503	$850	15%	12.2	1.8x peer avg

REFERRAL_NETWORK

provider_id	top_referrer	referral_concentration	patient_overlap_pct
PRV-501	Dr. R. Martinez	92%	88%
PRV-502	Multiple (15+)	12%	8%
PRV-503	Dr. K. Patel	65%	52%

PROCEDURE_ANALYSIS

provider_id	top_code	frequency	peer_frequency	upcoding_signal
PRV-501	99214 (Moderate)	12%	45%	Low usage (possible upcoding to 99215)
PRV-501	99215 (High)	68%	22%	3.1x above peer norm
PRV-503	97110 (Therapeutic)	45%	38%	1.2x above peer norm

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(PROVIDERS.FRAUD_CONFIRMED = 'True', 0, 12, months)
FOR EACH PROVIDERS.PROVIDER_ID
WHERE BILLING_PATTERNS.VS_PEER_AVG > 1.5

Prediction output

Every entity gets a score, updated continuously

PROVIDER_ID	SPECIALTY	FRAUD_SCORE	EST_OVERPAYMENT	SIU_PRIORITY
PRV-501	Orthopedics	0.89	$2.4M/yr	Critical
PRV-503	PT/Rehab	0.52	$420K/yr	High
PRV-502	Radiology	0.08	$0	Low

Understand why

Every prediction includes feature attributions — no black boxes

Provider PRV-501 (MedPro Spine Clinic)

Predicted: 89% fraud probability, est. $2.4M/yr overpayment

Top contributing features

High-complexity code rate (99215)

68% vs 22% peer

28% attribution

Referral concentration from single source

92%

25% attribution

Claims per patient far above peer

8.4 vs 2.6

21% attribution

Patient overlap with referring provider

88%

15% attribution

Average claim amount anomaly

$4,800 vs $1,500

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about provider fraud detection

How does AI detect healthcare provider fraud?

AI detects provider fraud by analyzing the relational network: billing patterns, referral concentrations, patient overlaps, procedure-code distributions, and peer comparisons. Graph-based models identify suspicious patterns like concentrated referral networks, selective upcoding on referred patients, and billing anomalies that correlate with specific referral relationships. These network signals surface fraud 6-12 months earlier than retrospective audits.

What are the most common types of healthcare provider fraud?

The most costly schemes are upcoding (billing high-complexity codes for simple visits), referral kickbacks (paying for patient referrals in exchange for inflated billing), phantom billing (billing for services never rendered), and unbundling (billing separately for procedures that should be billed as a package). Upcoding and referral kickbacks are the hardest to detect because they look like legitimate high-acuity practice to simple outlier detection.

How much does healthcare provider fraud cost insurers?

Healthcare fraud costs $68-230B annually in the US (National Health Care Anti-Fraud Association), with provider-driven schemes accounting for the largest share. A single fraudulent practice can bill $5-15M before detection using traditional methods. The 12-24 month detection lag means insurers pay out billions in fraudulent claims before catching the pattern.

How can insurers prioritize which providers to audit?

Graph-based fraud scores prioritize providers by combining billing anomaly signals with network risk indicators (referral concentration, patient overlap, peer deviation). This produces a ranked list where the top-scored providers are 5-10x more likely to be confirmed fraudulent than randomly selected audit targets. SIU teams can focus their limited audit capacity on the highest-impact cases.

Bottom line: Detect fraudulent providers 6-12 months earlier and recover $100-300M in annual overpayments for a top-10 health insurer while focusing SIU resources on the highest-impact investigations.

Related use cases

Explore more insurance use cases

Use Case #1Claims Fraud DetectionLearn more

Use Case #4Claims Severity PredictionLearn more

Use Case #5Subrogation RecoveryLearn more

Previous#6 Cross-Sell Optimization

Next#8 Pricing Optimization

Topics covered

provider fraud detection AIhealthcare fraud analyticsmedical billing fraud predictioninsurance provider audit AIgraph neural network provider fraudKumoRFMrelational deep learning insuranceupcoding detectionphantom billing detectionhealthcare claims analytics

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free