Provider Fraud Detection
“Which healthcare providers are submitting suspicious claims?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Which healthcare providers are submitting suspicious claims?
Healthcare fraud accounts for 3-10% of total health spending, costing $68-230B annually in the US (National Health Care Anti-Fraud Association). Provider-driven fraud (upcoding, unbundling, phantom billing, unnecessary procedures) is the largest category, yet most schemes are detected only through retrospective audits 12-24 months after the billing occurs. By then, the insurer has already paid out millions. A single fraudulent medical practice can bill $5-15M before detection. SIU teams can only audit 1-2% of providers annually, so targeting accuracy is critical.
Quick answer
AI detects provider fraud by mapping billing patterns, referral networks, procedure-code distributions, and patient overlaps into a relational graph. Unlike retrospective audits that take 12-24 months to catch schemes, graph-based models identify suspicious providers within weeks by detecting anomalous patterns: concentrated referral networks, upcoding signatures, and billing rates far above specialty peers. A single fraudulent practice can bill $5-15M before traditional detection catches up.
Approaches compared
4 ways to solve this problem
1. Retrospective Claims Audits
SIU teams review provider billing patterns after the fact, typically 12-24 months after claims are paid. Triggered by complaints, tips, or periodic sampling. The traditional approach across the industry.
Best for
Building legal cases for recovery when fraud is already suspected. Audits produce evidence admissible in court.
Watch out for
By the time the audit happens, the insurer has already paid out $5-15M to the fraudulent provider. Recovery rates on paid claims are low. SIU teams can only audit 1-2% of providers annually.
2. Statistical Outlier Detection
Flag providers whose billing metrics (average claim amount, procedure frequency, claims per patient) exceed peer benchmarks by a statistical threshold. Automated, scalable, and easy to explain.
Best for
Catching egregious outliers: providers billing 3x the peer average are hard to miss with simple z-score analysis.
Watch out for
Smart fraudsters stay just below detection thresholds. Outlier detection also generates high false positives because legitimate practice variation (e.g., a surgeon who handles complex cases) looks like anomalous billing.
3. Rules-Based Upcoding Detection
Rules that flag specific billing patterns: high rates of E&M code 99215 (high complexity) vs. 99214 (moderate), unbundled procedure codes that should be billed together, or duplicate billing within short windows.
Best for
Known upcoding schemes where the billing pattern is well-documented and the rules are clear.
Watch out for
Rules catch yesterday's fraud. Providers shift to new coding patterns (e.g., unbundling instead of upcoding) once rules are published. Each new scheme requires a new rule, creating a perpetual cat-and-mouse game.
4. Graph Neural Networks (Kumo's Approach)
Connects providers to their billing patterns, referral networks, patient overlaps, procedure distributions, and peer benchmarks in a relational graph. Detects coordinated fraud patterns across the provider-referrer-patient network.
Best for
Detecting referral-kickback schemes, patient-sharing fraud rings, and sophisticated upcoding patterns where the signal is in the network structure, not individual billing metrics.
Watch out for
Requires referral and patient-provider relationship data to be connected. If claims data does not capture referring-provider information, the network analysis is limited to billing patterns alone.
Key metric: Healthcare fraud costs $68-230B annually (NHCAA). Graph-based detection identifies fraudulent providers 6-12 months earlier, recovering $100-300M in overpayments for a top-10 health insurer.
Why relational data changes the answer
Flat provider-fraud models analyze each provider's billing in isolation: average claim amount, procedure mix, claims volume versus peers. They can flag a provider billing 3x the peer average. But they cannot see that the provider receives 92% of referrals from a single referring physician, shares 88% of patients with that referrer, bills high-complexity codes at 3x the peer rate specifically on referred patients, and the referrer's billing pattern changed dramatically after the relationship started. This referral-kickback pattern is one of the most expensive fraud schemes in healthcare, and it is invisible to any model that does not examine the provider network.
Relational learning maps the full provider ecosystem. The model walks from the target provider to their referral sources, to the patients they share, to the billing patterns on shared versus non-shared patients, to the referrer's own billing changes over time. It learns that concentrated referral relationships with high patient overlap and correlated billing anomalies are the signature of kickback schemes. It also detects that a provider's upcoding pattern is selective: they bill at high-complexity rates only for patients referred by the suspected partner, while billing normally for walk-in patients. This selectivity is a strong fraud signal that flat models miss entirely because they average across all patients.
Detecting provider fraud from billing summaries alone is like auditing a business by looking only at their income statement. The numbers might look unusual, but you cannot tell if they are running a legitimate high-volume practice or laundering referral kickbacks. Graph-based detection is the equivalent of a forensic accountant who traces every dollar through the network of relationships: who referred who, who treated who, and how the billing changed after those relationships formed.
How KumoRFM solves this
Relational intelligence built for insurance data
Kumo connects providers, claims, patients, referral networks, procedure codes, and billing patterns into a relational graph. The model detects that Provider PRV-501 bills 3x the average number of high-complexity procedures, shares patients with a referring provider at an unusually high rate (92% of referrals come from one source), and has a billing-code distribution that deviates significantly from peer providers in the same specialty and region. These graph-based signals surface suspicious providers 6-12 months earlier than traditional audit triggers.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
PROVIDERS
| provider_id | name | specialty | region | years_in_network |
|---|---|---|---|---|
| PRV-501 | MedPro Spine Clinic | Orthopedics | Southeast | 4.2 |
| PRV-502 | City General Radiology | Radiology | Northeast | 12.8 |
| PRV-503 | Sunrise Physical Therapy | PT/Rehab | West | 7.5 |
BILLING_PATTERNS
| provider_id | avg_claim_amount | high_complexity_rate | claims_per_patient | vs_peer_avg |
|---|---|---|---|---|
| PRV-501 | $4,800 | 78% | 8.4 | 3.2x peer avg |
| PRV-502 | $1,200 | 32% | 3.1 | 1.1x peer avg |
| PRV-503 | $850 | 15% | 12.2 | 1.8x peer avg |
REFERRAL_NETWORK
| provider_id | top_referrer | referral_concentration | patient_overlap_pct |
|---|---|---|---|
| PRV-501 | Dr. R. Martinez | 92% | 88% |
| PRV-502 | Multiple (15+) | 12% | 8% |
| PRV-503 | Dr. K. Patel | 65% | 52% |
PROCEDURE_ANALYSIS
| provider_id | top_code | frequency | peer_frequency | upcoding_signal |
|---|---|---|---|---|
| PRV-501 | 99214 (Moderate) | 12% | 45% | Low usage (possible upcoding to 99215) |
| PRV-501 | 99215 (High) | 68% | 22% | 3.1x above peer norm |
| PRV-503 | 97110 (Therapeutic) | 45% | 38% | 1.2x above peer norm |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(PROVIDERS.FRAUD_CONFIRMED = 'True', 0, 12, months) FOR EACH PROVIDERS.PROVIDER_ID WHERE BILLING_PATTERNS.VS_PEER_AVG > 1.5
Prediction output
Every entity gets a score, updated continuously
| PROVIDER_ID | SPECIALTY | FRAUD_SCORE | EST_OVERPAYMENT | SIU_PRIORITY |
|---|---|---|---|---|
| PRV-501 | Orthopedics | 0.89 | $2.4M/yr | Critical |
| PRV-503 | PT/Rehab | 0.52 | $420K/yr | High |
| PRV-502 | Radiology | 0.08 | $0 | Low |
Understand why
Every prediction includes feature attributions — no black boxes
Provider PRV-501 (MedPro Spine Clinic)
Predicted: 89% fraud probability, est. $2.4M/yr overpayment
Top contributing features
High-complexity code rate (99215)
68% vs 22% peer
28% attribution
Referral concentration from single source
92%
25% attribution
Claims per patient far above peer
8.4 vs 2.6
21% attribution
Patient overlap with referring provider
88%
15% attribution
Average claim amount anomaly
$4,800 vs $1,500
11% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about provider fraud detection
How does AI detect healthcare provider fraud?
AI detects provider fraud by analyzing the relational network: billing patterns, referral concentrations, patient overlaps, procedure-code distributions, and peer comparisons. Graph-based models identify suspicious patterns like concentrated referral networks, selective upcoding on referred patients, and billing anomalies that correlate with specific referral relationships. These network signals surface fraud 6-12 months earlier than retrospective audits.
What are the most common types of healthcare provider fraud?
The most costly schemes are upcoding (billing high-complexity codes for simple visits), referral kickbacks (paying for patient referrals in exchange for inflated billing), phantom billing (billing for services never rendered), and unbundling (billing separately for procedures that should be billed as a package). Upcoding and referral kickbacks are the hardest to detect because they look like legitimate high-acuity practice to simple outlier detection.
How much does healthcare provider fraud cost insurers?
Healthcare fraud costs $68-230B annually in the US (National Health Care Anti-Fraud Association), with provider-driven schemes accounting for the largest share. A single fraudulent practice can bill $5-15M before detection using traditional methods. The 12-24 month detection lag means insurers pay out billions in fraudulent claims before catching the pattern.
How can insurers prioritize which providers to audit?
Graph-based fraud scores prioritize providers by combining billing anomaly signals with network risk indicators (referral concentration, patient overlap, peer deviation). This produces a ranked list where the top-scored providers are 5-10x more likely to be confirmed fraudulent than randomly selected audit targets. SIU teams can focus their limited audit capacity on the highest-impact cases.
Bottom line: Detect fraudulent providers 6-12 months earlier and recover $100-300M in annual overpayments for a top-10 health insurer while focusing SIU resources on the highest-impact investigations.
Related use cases
Explore more insurance use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




