Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Use Case11 min read

Insurance Claims Fraud: RGCNConv on Claims Networks

Insurance fraud costs $80B+ annually in the US alone. Individual claim analysis catches obvious cases but misses coordinated fraud rings. Here is how to build a GNN that detects ring structures in the claims network.

PyTorch Geometric

TL;DR

  • 1Insurance fraud is a graph pattern problem. Fraud rings create dense subgraphs in the claims network: the same providers, lawyers, and claimants appearing together across multiple suspicious claims.
  • 2RGCNConv applies different weights per relationship type (claimant-doctor, claimant-lawyer, claim-provider), learning type-specific fraud signatures.
  • 3On RelBench benchmarks, GNNs achieve 75.83 AUROC vs 62.44 for flat-table LightGBM. Ring structure detection accounts for most of the improvement.
  • 4The PyG model is ~35 lines, but production claims fraud systems need SIU workflow integration, regulatory reporting, and appeal handling.
  • 5KumoRFM detects claims fraud with one PQL query (76.71 AUROC zero-shot), automatically constructing the claims network and detecting ring structures.

The business problem

The Coalition Against Insurance Fraud estimates that fraud costs US insurers $80 billion annually. This cost is passed directly to consumers through higher premiums. The most damaging fraud involves organized rings: networks of claimants, providers, and professionals who submit coordinated false claims. Individual claim analysis flags obvious outliers but misses these sophisticated operations.

Why flat ML fails

  • No ring detection: Organized fraud rings involve 5-50 participants filing claims that individually look normal. The pattern is only visible in the network connections between claims.
  • No provider context: A doctor with 100 patients is normal. A doctor whose patients all share the same lawyer and body shop is suspicious. Flat models see patient counts, not connection patterns.
  • Relationship type matters: The claimant-doctor relationship carries different fraud signal than claimant-witness or claim-adjuster relationships. Flat models treat all connections equally.
  • Temporal patterns: Fraud rings ramp up gradually, filing increasingly brazen claims over time. The temporal evolution of the network is a strong signal that flat features miss.

The relational schema

schema.txt
Node types:
  Claim     (id, amount, type, date, status)
  Claimant  (id, age, policy_tenure, claim_history)
  Provider  (id, specialty, license_date, avg_billing)
  Policy    (id, type, premium, coverage_limit)

Edge types:
  Claim    --[filed_by]-->    Claimant
  Claim    --[treated_by]-->  Provider
  Claim    --[under]-->       Policy
  Claimant --[referred_by]--> Provider
  Provider --[co_billed]-->   Provider (shared_claims_count)

Five node/edge types. The co_billed edges between providers surface ring structures: providers who repeatedly appear on the same claims.

PyG architecture: RGCNConv for claims networks

claims_fraud_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import RGCNConv, Linear

class ClaimsFraudGNN(torch.nn.Module):
    def __init__(self, in_dim, hidden_dim=64, num_relations=5):
        super().__init__()
        self.lin = Linear(in_dim, hidden_dim)

        # RGCNConv: separate weights per edge type
        self.conv1 = RGCNConv(
            hidden_dim, hidden_dim, num_relations=num_relations,
            num_bases=8)  # basis decomposition for efficiency
        self.conv2 = RGCNConv(
            hidden_dim, hidden_dim, num_relations=num_relations,
            num_bases=8)

        self.classifier = torch.nn.Sequential(
            Linear(hidden_dim, 32),
            torch.nn.ReLU(),
            Linear(32, 1),
        )

    def forward(self, x, edge_index, edge_type):
        x = F.relu(self.lin(x))
        x = F.relu(self.conv1(x, edge_index, edge_type))
        x = self.conv2(x, edge_index, edge_type)

        return torch.sigmoid(self.classifier(x).squeeze(-1))

# Training with focal loss for class imbalance
def focal_loss(pred, target, gamma=2.0, alpha=0.75):
    bce = F.binary_cross_entropy(pred, target, reduction='none')
    pt = torch.where(target == 1, pred, 1 - pred)
    weight = alpha * (1 - pt) ** gamma
    return (weight * bce).mean()

RGCNConv with basis decomposition. Separate weight matrices per relationship type let the model learn that claimant-provider connections carry different fraud signal than provider-provider connections.

Expected performance

  • Rule-based system: ~45 AUROC (high false positive rate)
  • LightGBM (flat-table): 62.44 AUROC
  • GNN (RGCNConv): 75.83 AUROC
  • KumoRFM (zero-shot): 76.71 AUROC

Or use KumoRFM in one line

KumoRFM PQL
PREDICT is_fraudulent FOR claim
USING claim, claimant, provider, policy

One PQL query. KumoRFM constructs the claims network, detects ring patterns, and outputs fraud probabilities with explainable attributions.

Frequently asked questions

Why are GNNs effective for insurance claims fraud?

Insurance fraud often involves coordinated rings: the same doctors, lawyers, body shops, and claimants appear together across multiple suspicious claims. These ring patterns are invisible when analyzing individual claims but obvious in the claims network graph. GNNs detect these structural patterns automatically.

What is RGCNConv and why use it for claims fraud?

RGCNConv (Relational Graph Convolutional Network) applies different transformation weights per edge type. In a claims graph, the relationship between a claimant and their doctor is different from the relationship between a claimant and their lawyer. RGCNConv learns these type-specific patterns, which is critical for detecting coordinated fraud.

How do you handle the class imbalance in insurance fraud?

Insurance fraud rates are typically 5-10% of claims. Use weighted cross-entropy loss with higher weight on fraud cases, or focal loss to focus on hard-to-classify borderline cases. Graph-level augmentation (subgraph sampling centered on known fraud rings) can also help balance training.

What is the role of the claims network topology in fraud detection?

Fraud rings create dense subgraphs: a small set of providers and claimants that are densely connected through multiple claims. Legitimate claims create sparse, tree-like patterns. The GNN learns to distinguish these topological signatures, flagging unusually dense subgraph regions as suspicious.

Can KumoRFM detect insurance claims fraud?

Yes. KumoRFM takes your claims database (claims, claimants, providers, policies) and detects fraud patterns with one PQL query. It automatically constructs the claims network, detects ring structures, and outputs fraud probabilities per claim.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.