Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Transaction Graphs: Financial Transactions as Directed Temporal Graphs

Every financial transaction is a directed edge from sender to receiver. When you assemble millions of these edges into a graph, fraud rings, money laundering chains, and suspicious patterns become visible structures that GNNs learn to detect.

PyTorch Geometric

TL;DR

  • 1Transaction graphs: accounts are nodes, transactions are directed temporal edges (sender to receiver) with features like amount, timestamp, and channel. The graph evolves as new transactions arrive.
  • 2Fraud patterns are structural: money laundering creates circular flows, account takeover creates unusual counterparty connections, organized fraud creates dense coordinated subgraphs. Flat tables cannot see these.
  • 3GNNs add 10-30% fraud detection lift over flat-table models because they capture neighborhood context: an account's risk depends on who it transacts with, who they transact with, and the flow patterns.
  • 4Temporal integrity is critical: temporal sampling and temporal splits prevent the model from seeing future transactions. Without them, fraud detection metrics are meaninglessly inflated.
  • 5Scale: major banks process 10-50M transactions/day. Real-time fraud scoring requires incremental GNN inference (process each new edge in milliseconds, not re-process the entire graph).

A transaction graph represents financial activity as a directed temporal graph where accounts are nodes and transactions are timestamped directed edges. Account A sending $500 to account B on March 1 becomes a directed edge from A to B with features (amount=$500, date=Mar 1, channel= wire). When millions of these edges are assembled, structural patterns emerge that reveal fraud, money laundering, and credit risk.

Why graphs transform fraud detection

Traditional fraud detection uses transaction-level features: amount, time, location, merchant category. This misses structural patterns:

  • Money laundering cycles: A → B → C → A. Money flows in a circle to obscure its origin. Invisible in a single transaction record, obvious in a graph.
  • Mule networks: A central fraudster distributes stolen funds across many accounts that then withdraw cash. The star topology is a clear graph signature.
  • Account takeover: A legitimate account suddenly transacts with counterparties it has never interacted with. The graph neighborhood changes dramatically.
  • Coordinated fraud: Multiple accounts making similar transactions to the same merchants in the same time window. Dense temporal subgraphs.

Graph construction

A production transaction graph is heterogeneous and temporal:

  • Node types: accounts, merchants, devices, IP addresses, card numbers
  • Edge types: transactions (directed, temporal), shared-device (account-device), shared-merchant (account-merchant)
  • Node features: account age, average balance, transaction frequency, KYC status
  • Edge features: amount, timestamp, channel (wire, ACH, card), currency

Multiple edge types provide different signals. Direct transactions carry the strongest fraud signal. Shared-device edges reveal accounts controlled by the same person. Shared-merchant edges provide weaker but useful context.

Temporal integrity

Transaction graphs are inherently temporal, and temporal integrity is non-negotiable:

  • Temporal sampling: When scoring a transaction at time T, the GNN can only see transactions that occurred before T. Future transactions leak the outcome.
  • Temporal splits: Train on January-March, test on April. Never use random splits, which mix future and past.
  • Consequence feature removal: Remove features created by the fraud investigation process (account freeze, chargeback) from training data.

Real-time inference

Production fraud detection requires scoring each transaction in real-time (under 100ms). This means the GNN cannot re-process the entire graph for each new transaction. Instead:

  1. Maintain pre-computed node embeddings for all accounts
  2. When a new transaction arrives, fetch embeddings for sender, receiver, and their neighbors
  3. Run a lightweight GNN forward pass on the local subgraph
  4. Update the sender and receiver embeddings with the new transaction information

This incremental approach processes each transaction in 10-50ms while maintaining graph context that accumulates over millions of historical transactions.

Results at scale

Transaction graph GNNs deployed at financial institutions report:

  • 15-25% reduction in false positive rates at the same fraud catch rate
  • Detection of organized fraud rings that flat-table models miss entirely
  • Earlier detection: graph signals appear 2-3 days before individual account anomalies
  • Recovery of $50-200M annually in prevented fraud at major banks

Frequently asked questions

What is a transaction graph?

A transaction graph represents financial activity as a directed temporal graph. Accounts (or entities like merchants, devices) are nodes. Transactions are directed edges from sender to receiver, with features like amount, timestamp, channel, and currency. The graph is temporal: edges have timestamps, and the structure evolves over time.

Why are transaction graphs effective for fraud detection?

Fraud patterns are fundamentally structural: money laundering involves circular flows, account takeover involves unusual transaction partners, and organized fraud involves coordinated accounts. These patterns are invisible in flat transaction tables but clearly visible as graph motifs (cycles, stars, dense subgraphs).

How large are production transaction graphs?

A major bank processes 10-50 million transactions per day, producing a graph with 100M+ nodes and billions of edges over a 90-day window. Real-time fraud detection requires processing each new transaction within milliseconds, necessitating efficient incremental GNN inference.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.