Why are GNNs effective for insurance claims fraud?

Insurance fraud often involves coordinated rings: the same doctors, lawyers, body shops, and claimants appear together across multiple suspicious claims. These ring patterns are invisible when analyzing individual claims but obvious in the claims network graph. GNNs detect these structural patterns automatically.

What is RGCNConv and why use it for claims fraud?

RGCNConv (Relational Graph Convolutional Network) applies different transformation weights per edge type. In a claims graph, the relationship between a claimant and their doctor is different from the relationship between a claimant and their lawyer. RGCNConv learns these type-specific patterns, which is critical for detecting coordinated fraud.

How do you handle the class imbalance in insurance fraud?

Insurance fraud rates are typically 5-10% of claims. Use weighted cross-entropy loss with higher weight on fraud cases, or focal loss to focus on hard-to-classify borderline cases. Graph-level augmentation (subgraph sampling centered on known fraud rings) can also help balance training.

What is the role of the claims network topology in fraud detection?

Fraud rings create dense subgraphs: a small set of providers and claimants that are densely connected through multiple claims. Legitimate claims create sparse, tree-like patterns. The GNN learns to distinguish these topological signatures, flagging unusually dense subgraph regions as suspicious.

Can KumoRFM detect insurance claims fraud?

Yes. KumoRFM takes your claims database (claims, claimants, providers, policies) and detects fraud patterns with one PQL query. It automatically constructs the claims network, detects ring structures, and outputs fraud probabilities per claim.

Insurance Claims Fraud with PyG: RGCNConv on Claims Networks | PyG Guide

The business problem

The Coalition Against Insurance Fraud estimates that fraud costs US insurers $80 billion annually. This cost is passed directly to consumers through higher premiums. The most damaging fraud involves organized rings: networks of claimants, providers, and professionals who submit coordinated false claims. Individual claim analysis flags obvious outliers but misses these sophisticated operations.

Why flat ML fails

No ring detection: Organized fraud rings involve 5-50 participants filing claims that individually look normal. The pattern is only visible in the network connections between claims.
No provider context: A doctor with 100 patients is normal. A doctor whose patients all share the same lawyer and body shop is suspicious. Flat models see patient counts, not connection patterns.
Relationship type matters: The claimant-doctor relationship carries different fraud signal than claimant-witness or claim-adjuster relationships. Flat models treat all connections equally.
Temporal patterns: Fraud rings ramp up gradually, filing increasingly brazen claims over time. The temporal evolution of the network is a strong signal that flat features miss.

The relational schema

schema.txt

Node types:
  Claim     (id, amount, type, date, status)
  Claimant  (id, age, policy_tenure, claim_history)
  Provider  (id, specialty, license_date, avg_billing)
  Policy    (id, type, premium, coverage_limit)

Edge types:
  Claim    --[filed_by]-->    Claimant
  Claim    --[treated_by]-->  Provider
  Claim    --[under]-->       Policy
  Claimant --[referred_by]--> Provider
  Provider --[co_billed]-->   Provider (shared_claims_count)

Five node/edge types. The co_billed edges between providers surface ring structures: providers who repeatedly appear on the same claims.

PyG architecture: RGCNConv for claims networks

claims_fraud_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import RGCNConv, Linear

class ClaimsFraudGNN(torch.nn.Module):
    def __init__(self, in_dim, hidden_dim=64, num_relations=5):
        super().__init__()
        self.lin = Linear(in_dim, hidden_dim)

        # RGCNConv: separate weights per edge type
        self.conv1 = RGCNConv(
            hidden_dim, hidden_dim, num_relations=num_relations,
            num_bases=8)  # basis decomposition for efficiency
        self.conv2 = RGCNConv(
            hidden_dim, hidden_dim, num_relations=num_relations,
            num_bases=8)

        self.classifier = torch.nn.Sequential(
            Linear(hidden_dim, 32),
            torch.nn.ReLU(),
            Linear(32, 1),
        )

    def forward(self, x, edge_index, edge_type):
        x = F.relu(self.lin(x))
        x = F.relu(self.conv1(x, edge_index, edge_type))
        x = self.conv2(x, edge_index, edge_type)

        return torch.sigmoid(self.classifier(x).squeeze(-1))

# Training with focal loss for class imbalance
def focal_loss(pred, target, gamma=2.0, alpha=0.75):
    bce = F.binary_cross_entropy(pred, target, reduction='none')
    pt = torch.where(target == 1, pred, 1 - pred)
    weight = alpha * (1 - pt) ** gamma
    return (weight * bce).mean()

RGCNConv with basis decomposition. Separate weight matrices per relationship type let the model learn that claimant-provider connections carry different fraud signal than provider-provider connections.

Expected performance

Rule-based system: ~45 AUROC (high false positive rate)
LightGBM (flat-table): 62.44 AUROC
GNN (RGCNConv): 75.83 AUROC
KumoRFM (zero-shot): 76.71 AUROC

Or use KumoRFM in one line

KumoRFM PQL

PREDICT is_fraudulent FOR claim
USING claim, claimant, provider, policy

One PQL query. KumoRFM constructs the claims network, detects ring patterns, and outputs fraud probabilities with explainable attributions.

Insurance Claims Fraud: RGCNConv on Claims Networks