The business problem
The UN estimates that $800 billion to $2 trillion is laundered globally each year. Banks spend $30+ billion annually on AML compliance, yet current systems detect less than 1% of illicit flows. The core problem: rule-based systems flag individual transactions (large amounts, unusual timing) but miss the structural patterns that define laundering.
A $9,500 cash deposit (just under the $10,000 reporting threshold) looks suspicious as a rule. But what about 50 transfers of $5,000 flowing through a chain of shell companies and returning to the originator minus a 2% fee? Each individual transaction looks normal. The pattern is only visible in the graph.
Why flat ML fails
- No cycle detection: Circular transaction chains (A to B to C to A) are the hallmark of laundering. Flat models see individual transactions, not chains.
- No layering detection: Funds split across multiple intermediaries and recombine. This fan-out/fan-in pattern requires multi-hop graph analysis.
- Excessive false positives: Rule-based systems generate 95%+ false positives because they lack context. A high-value transfer between long-time business partners is normal; the same amount to a newly created shell company is suspicious. Context requires the graph.
- Slow adaptation: Launderers adapt to rules. Graph patterns are harder to evade because the fundamental need to move and recombine funds creates structural signatures.
The relational schema
Node types:
Account (id, type, creation_date, country, kyc_level)
Entity (id, type, incorporation_date, industry)
Transaction (id, amount, currency, timestamp, channel)
Edge types:
Account --[owned_by]--> Entity
Account --[transfers_to]--> Account (amount, timestamp)
Entity --[controls]--> Entity (ownership_pct)
Transaction --[from]--> Account
Transaction --[to]--> AccountThe account-entity-transaction graph captures ownership chains and money flows. Cycles in transfers_to edges signal potential laundering.
PyG architecture: GATConv for cycle-aware AML
import torch
import torch.nn.functional as F
from torch_geometric.nn import GATConv, HeteroConv, Linear
class AMLGNN(torch.nn.Module):
def __init__(self, hidden_dim=64, heads=4):
super().__init__()
self.account_lin = Linear(-1, hidden_dim)
self.entity_lin = Linear(-1, hidden_dim)
# 3 layers for cycle detection (3-hop paths)
self.convs = torch.nn.ModuleList()
for _ in range(3):
conv = HeteroConv({
('account', 'transfers_to', 'account'): GATConv(
hidden_dim, hidden_dim // heads, heads=heads),
('account', 'owned_by', 'entity'): GATConv(
hidden_dim, hidden_dim // heads, heads=heads),
('entity', 'controls', 'entity'): GATConv(
hidden_dim, hidden_dim // heads, heads=heads),
}, aggr='sum')
self.convs.append(conv)
self.classifier = torch.nn.Sequential(
Linear(hidden_dim, 32),
torch.nn.ReLU(),
Linear(32, 1),
)
def forward(self, x_dict, edge_index_dict):
x_dict['account'] = self.account_lin(x_dict['account'])
x_dict['entity'] = self.entity_lin(x_dict['entity'])
for conv in self.convs:
x_dict = {k: F.elu(v) for k, v in
conv(x_dict, edge_index_dict).items()}
return torch.sigmoid(
self.classifier(x_dict['account']).squeeze(-1))3-layer GATConv enables cycle detection: each account sees its 3-hop neighborhood, including paths that circle back to itself. Attention weights identify which transaction paths are most suspicious.
Training considerations
- Label scarcity: Confirmed laundering cases are rare and often discovered months later. Use suspicious activity reports (SARs) as weak labels and supplement with synthetic laundering patterns.
- Temporal ordering: Transaction order matters. Use temporal edge features and ensure the model only sees transactions before the prediction timestamp.
- Graph snapshots: Build daily or weekly graph snapshots. Laundering patterns evolve, and the model should see the graph at the time of each prediction.
- False positive optimization: Optimize for precision at high recall thresholds. Compliance teams need manageable alert volumes, not maximum recall.
Expected performance
- Rule-based system: ~40 AUROC (high recall, 95%+ false positive rate)
- LightGBM (flat-table): 62.44 AUROC
- GNN (3-layer GATConv): 75.83 AUROC
- KumoRFM (zero-shot): 76.71 AUROC
Or use KumoRFM in one line
PREDICT is_suspicious FOR account
USING account, entity, transactionOne PQL query. KumoRFM constructs the temporal transaction graph, detects cycle and layering patterns automatically, and outputs suspicion scores per account.
KumoRFM replaces graph construction, cycle-aware architecture design, and training with a single query. It achieves 76.71 AUROC zero-shot while providing the prediction explanations needed for SAR filing and regulatory review.