Graph neural networks detect fraud that tabular models structurally cannot see. A gradient-boosted tree scores each transaction or account as an independent row. It can learn that high-amount transactions at 3 AM are risky. But it cannot learn that five accounts sharing the same device fingerprint all sent money to the same beneficiary within 10 minutes. That pattern exists in the connections between rows, not in any single row.
When you represent transactions, accounts, devices, and merchants as a graph, fraud rings become visible structural patterns: dense clusters of nodes with unusual connectivity. GNNs learn to recognize these patterns automatically.
Why tabular models fail on coordinated fraud
Consider a money laundering ring with 8 accounts. Each account individually looks normal: moderate balances, reasonable transaction amounts, accounts aged 6+ months. A tabular model scores each account at low risk.
But in the graph, the pattern is obvious. These 8 accounts form a near-complete subgraph: they transact almost exclusively with each other, share 2 device fingerprints, and all received their initial deposits from the same source account within 48 hours. The graph structure screams anomaly. The tabular features whisper normalcy.
Building the fraud graph
A production fraud graph is heterogeneous, meaning it has multiple node types and edge types:
- Node types: Account, Device, IP Address, Merchant, Transaction
- Edge types: account-uses-device, account-has-IP, account-sends-to-account, transaction-at-merchant, account-initiates-transaction
- Node features: Account (age, balance, country), Device (OS, fingerprint hash), Transaction (amount, timestamp, channel)
- Edge features: Timestamp, amount, frequency of connection
Why heterogeneous graphs matter
Different entity types carry different fraud signals. A device shared by 50 accounts is suspicious. A merchant with 90% chargebacks is suspicious. An IP address used from two countries simultaneously is suspicious. Heterogeneous GNNs (using message passing with type-specific transformations) learn separate patterns for each entity type while allowing information to flow across types.
How GNNs detect fraud rings
The detection mechanism is message passing across the transaction graph:
Layer 1: direct connections
Each account node aggregates features from its direct neighbors: its devices, its transactions, its counterparties. After layer 1, the account embedding encodes: “I use 2 devices, made 47 transactions this month, and transact with 12 unique counterparties.”
Layer 2: two-hop neighborhood
Now each account sees its counterparties' neighborhoods. The embedding encodes: “My counterparties share 3 devices with each other, 6 of my 12 counterparties also transact with the same 2 merchant accounts, and 4 of them received initial funding from the same source.” This is the fraud ring signal. It is invisible at one hop.
import torch
from torch_geometric.nn import SAGEConv, to_hetero
from torch_geometric.data import HeteroData
class FraudGNN(torch.nn.Module):
def __init__(self, hidden_channels):
super().__init__()
self.conv1 = SAGEConv((-1, -1), hidden_channels)
self.conv2 = SAGEConv((-1, -1), hidden_channels)
self.classifier = torch.nn.Linear(hidden_channels, 1)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index).relu()
x = self.conv2(x, edge_index)
return self.classifier(x) # per-node fraud score
# Convert to heterogeneous model automatically
model = FraudGNN(hidden_channels=64)
model = to_hetero(model, data.metadata(), aggr='sum')PyG's to_hetero() converts a homogeneous GNN into a heterogeneous one that handles multiple node and edge types automatically.
Production considerations
Deploying graph-based fraud detection at scale requires solving three challenges:
- Scale: Production fraud graphs have hundreds of millions of nodes. GraphSAGE with mini-batching and neighbor sampling trains on subgraphs, not the full graph.
- Latency: Real-time scoring requires sub-100ms inference. Pre-compute neighbor embeddings and only run the final layers on the local subgraph at inference time.
- Temporal integrity: The model must not use future information. Edges must be filtered by timestamp so that at prediction time, only past transactions are visible. This is where temporal heterogeneous graphs become essential.
Graph signals tabular models cannot capture
The following patterns exist only in graph structure:
- Ring topology: A circular flow of funds (A sends to B, B to C, ..., Z back to A) that launders money through apparent diversity of counterparties.
- Device sharing clusters: Multiple “independent” accounts controlled from the same small set of devices.
- Rapid fan-out: A single deposit split and forwarded through many intermediate accounts before consolidation. The graph shows the tree structure; tabular data shows individual transfers.
- Behavioral mimicry with structural anomaly: Each account mimics normal behavior (amounts, timing, frequency) but the subgraph connectivity pattern is abnormal.