Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Use Case12 min read

Fraud Detection: HeteroConv on Banking Transaction Graphs

Banks lose $30B+ annually to transaction fraud. Flat-table models catch simple cases but miss coordinated fraud rings. Here is how to build a heterogeneous GNN that sees the full network and catches what XGBoost cannot.

PyTorch Geometric

TL;DR

  • 1Banking fraud is a graph problem: accounts, merchants, devices, and transactions form a heterogeneous network. Fraud rings are invisible to single-transaction models but obvious in the graph.
  • 2HeteroConv with GATConv per edge type lets you model different relationship types (sends-to, pays-at, uses-device) with separate learned transformations and attention weights.
  • 3A 2-layer HeteroConv model on a transaction graph reaches 75.83 AUROC on RelBench benchmarks, compared to 62.44 for flat-table LightGBM, a 13+ point improvement.
  • 4The PyG implementation requires ~40 lines of model code plus significant infrastructure for graph construction, mini-batch sampling, and real-time serving.
  • 5KumoRFM achieves 76.71 AUROC zero-shot on the same benchmark with one line of PQL. No graph construction, no training code, no serving infrastructure.

The business problem

Global card fraud losses exceeded $30 billion in 2023 and are projected to reach $40 billion by 2027. Every basis point of improvement in detection saves millions. The challenge is not just catching fraud but catching it before the transaction completes, in under 100 milliseconds.

Traditional models see each transaction in isolation: amount, time, merchant category, velocity counters. They miss the relational signal. A $50 coffee purchase looks normal until you see that the card was used at a gas station 300 miles away 10 minutes ago, the merchant shares a terminal ID with three other flagged merchants, and the IP address is associated with a device used in five other fraud cases this week.

Why flat ML fails

Flat-table models like XGBoost or logistic regression operate on hand-engineered features derived from a single row of data. You can add velocity features (transactions per hour), aggregate features (average spend at this merchant), and even some network-derived features (degree of separation from known fraud). But these features are:

  • Static snapshots that miss evolving patterns
  • Manually engineered, requiring domain expertise to define and maintain
  • Lossy, collapsing rich relational structure into a few numbers
  • One-hop at best, missing the multi-hop patterns that define fraud rings

On RelBench fraud benchmarks, LightGBM with extensive feature engineering achieves 62.44 AUROC. A GNN that directly operates on the transaction graph achieves 75.83 AUROC, a gap that no amount of feature engineering can close because the signal lives in the graph structure itself.

The relational schema

A banking fraud graph typically includes these entities and relationships:

schema.txt
Node types:
  Account    (id, balance, account_age, type)
  Merchant   (id, category, avg_txn, terminal_count)
  Device     (id, os, ip_hash, first_seen)
  Transaction (id, amount, timestamp, is_fraud)

Edge types:
  Account   --[sends_to]-->    Account
  Account   --[transacts_at]--> Merchant
  Account   --[uses]-->        Device
  Transaction --[from]-->      Account
  Transaction --[to]-->        Merchant

Five node/edge types. Flat-table models collapse this into a single row. GNNs preserve the full structure.

PyG architecture: HeteroConv + GATConv

The heterogeneous schema demands HeteroConv, which wraps a separate GNN layer per edge type. Inside each edge type, we use GATConv so the model can learn which neighbors matter most. A transaction to a flagged merchant should carry more weight than a transaction to a grocery store.

fraud_detection_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import HeteroConv, GATConv, Linear

class FraudGNN(torch.nn.Module):
    def __init__(self, hidden_dim=64, heads=4):
        super().__init__()
        # Project each node type to shared dim
        self.account_lin = Linear(-1, hidden_dim)
        self.merchant_lin = Linear(-1, hidden_dim)
        self.device_lin = Linear(-1, hidden_dim)

        # Layer 1: type-specific GAT convolutions
        self.conv1 = HeteroConv({
            ('account', 'sends_to', 'account'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('account', 'transacts_at', 'merchant'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('account', 'uses', 'device'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        # Layer 2: second hop
        self.conv2 = HeteroConv({
            ('account', 'sends_to', 'account'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('account', 'transacts_at', 'merchant'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('account', 'uses', 'device'): GATConv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        self.classifier = Linear(hidden_dim, 1)

    def forward(self, x_dict, edge_index_dict):
        # Encode node features per type
        x_dict['account'] = self.account_lin(x_dict['account'])
        x_dict['merchant'] = self.merchant_lin(x_dict['merchant'])
        x_dict['device'] = self.device_lin(x_dict['device'])

        # Message passing (2 hops)
        x_dict = self.conv1(x_dict, edge_index_dict)
        x_dict = {k: F.elu(v) for k, v in x_dict.items()}
        x_dict = self.conv2(x_dict, edge_index_dict)

        # Classify transactions via source account
        return self.classifier(x_dict['account']).squeeze(-1)

~40 lines of model code. But you still need graph construction, mini-batch sampling with NeighborLoader, class-imbalanced loss, and serving infrastructure.

Training and evaluation

Training a fraud GNN involves several additional considerations beyond the model code:

  • Class imbalance: Use focal loss or weighted BCE. Fraud is typically 0.1-0.5% of transactions.
  • Mini-batch sampling: Use PyG's NeighborLoader to sample 2-hop subgraphs. Full-batch training is infeasible on production-scale transaction graphs.
  • Temporal splitting: Never leak future information. Train on transactions before time T, validate on T to T+1, test on T+1 to T+2.
  • Feature encoding: Categorical features (merchant category, device OS) need embedding layers. Numerical features (amount, balance) need normalization.

Expected performance

On RelBench fraud-related benchmarks, the performance hierarchy is clear:

  • LightGBM (flat-table): 62.44 AUROC
  • GNN (hand-tuned HeteroConv): 75.83 AUROC
  • KumoRFM (zero-shot): 76.71 AUROC

The 13+ point gap between flat-table and GNN represents the structural signal that lives in the graph. The additional 0.88 points from KumoRFM come from its pre-trained relational graph transformer, which has learned patterns across many relational datasets.

Or use KumoRFM in one line

KumoRFM replaces the entire pipeline above with a single Predictive Query:

KumoRFM PQL
PREDICT is_fraud FOR transaction
USING account, merchant, device, transaction

One line of PQL. KumoRFM auto-constructs the heterogeneous graph, selects the architecture, trains with temporal awareness, and serves predictions via API. 76.71 AUROC, zero code.

No graph construction. No architecture selection. No training loop. No serving infrastructure. KumoRFM's pre-trained relational graph transformer handles heterogeneous schemas, temporal dynamics, and class imbalance automatically. It achieves 76.71 AUROC on the same benchmark, slightly exceeding hand-tuned GNN baselines, in minutes instead of months.

Frequently asked questions

Why are GNNs better than flat-table models for fraud detection?

Flat-table models (XGBoost, logistic regression) only see features of the individual transaction. GNNs see the transaction in context: who sent money to whom, how those accounts are connected, and whether the surrounding network looks suspicious. Fraud rings that are invisible to flat-table models become obvious when you look at the graph structure.

What PyG layer works best for fraud detection?

HeteroConv with GATConv per edge type. Banking data is inherently heterogeneous (accounts, transactions, merchants, devices). HeteroConv applies type-specific transformations, and GATConv within each type learns which neighbors matter most, critical for separating suspicious connections from routine ones.

How do you handle class imbalance in fraud detection with GNNs?

Fraud is rare (typically 0.1-0.5% of transactions). Use focal loss or weighted cross-entropy to up-weight fraud cases. You can also oversample fraud subgraphs during mini-batch training with PyG's NeighborLoader, ensuring each batch contains enough positive examples for stable gradients.

Can GNN fraud detection run in real time?

Yes, but it requires infrastructure. Pre-compute node embeddings for known entities (accounts, merchants). When a new transaction arrives, run a 2-hop subgraph query, compute the new edge embedding using cached neighbor features, and classify. Inference takes 5-20ms with proper caching. KumoRFM handles this infrastructure automatically.

How does KumoRFM compare to custom PyG fraud models?

KumoRFM achieves 76.71 AUROC on RelBench benchmarks vs 75.83 for hand-tuned GNN baselines, without any manual feature engineering, architecture search, or training code. It handles heterogeneous schemas, temporal dynamics, and class imbalance out of the box. A custom PyG model can match this but typically requires months of ML engineering.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.