If you search "best algorithm for fraud detection," you will get a dozen blog posts that all say the same thing: XGBoost. And they are not wrong - for a specific type of fraud. XGBoost is excellent at scoring individual transactions against tabular features like amount, time of day, merchant category, and velocity counts. It has earned its reputation.
But here is what those posts leave out: the fraud that is actually growing fastest - organized rings, synthetic identity networks, coordinated account takeovers - produces signals that do not live in any single transaction row. The signal is in the connections between accounts. And XGBoost cannot read connections. Neither can random forest, LightGBM, or any model that takes a flat table as input.
This is not a theoretical gap. It is a structural one. And understanding it changes how you architect fraud detection.
The three algorithms, compared directly
Before getting into the details, here is a head-to-head comparison across the dimensions that matter most for fraud detection teams.
xgboost_vs_random_forest_vs_gnn_fraud_detection
| dimension | XGBoost | Random Forest | Graph Neural Network (GNN) |
|---|---|---|---|
| Input format | Flat table (one row per transaction) | Flat table (one row per transaction) | Graph of connected entities (accounts, devices, addresses, transactions) |
| Best fraud type | Individual anomalous transactions | Individual anomalous transactions | Organized rings, synthetic identity networks, coordinated attacks |
| Handles missing values | Yes - natively | Requires imputation | Depends on implementation |
| Training speed | Fast (minutes to hours) | Fast (minutes to hours) | Slower (hours to days for custom GNNs); seconds for KumoRFM zero-shot |
| Interpretability | High - SHAP values, feature importance | High - feature importance, decision paths | Moderate - attention weights, subgraph explanations |
| Can detect shared-device fraud rings | No - cannot see cross-entity connections | No - cannot see cross-entity connections | Yes - reads device-account-transaction graph directly |
| Multi-hop pattern detection | No - limited to single-row features | No - limited to single-row features | Yes - propagates signals 6-7+ hops across entity graph |
| Feature engineering required | Heavy - velocity features, aggregations, time windows | Heavy - same as XGBoost | Minimal with foundation model approach; heavy with custom GNN |
| False positive rate | Moderate - good on known patterns, blind to ring context | Moderate to high - ensemble averaging can over-trigger | Lower for organized fraud - graph context reduces false alarms |
| Production maturity | Very high - industry standard since 2016 | Very high - established since 2001 | Growing - major banks deploying since 2022 |
| Scalability | Excellent - handles billions of rows | Good - memory-heavy on large datasets | Depends on graph size; managed platforms handle enterprise scale |
Head-to-head comparison across 11 dimensions. XGBoost and random forest dominate single-transaction detection. GNNs dominate organized fraud and ring detection. The two approaches are complementary, not competing.
How each algorithm works for fraud detection
XGBoost
XGBoost has been the go-to fraud detection algorithm at banks, fintechs, and payment processors for nearly a decade. It builds an ensemble of decision trees sequentially, where each new tree corrects errors from the previous ones. For single-transaction fraud - a stolen credit card used at an unusual merchant, a suspiciously large wire transfer, an account login from a new country - XGBoost is hard to beat. It handles missing values natively (common in fraud data), trains quickly on large datasets, and produces well-calibrated probability scores that fraud operations teams can threshold and act on.
- Best for: Individual transaction scoring on tabular features (amount, velocity, merchant category, time of day). Production-proven at scale with fast training and inference.
- Watch out for: Cannot see connections between entities. Blind to fraud rings, shared-device networks, and coordinated attacks because it treats each transaction as an independent row.
Random Forest
Random forest builds hundreds of independent decision trees on random subsets of the data and averages their predictions. For fraud detection, it is a solid alternative to XGBoost. It is more robust to noisy fraud labels (a real problem since fraud labels are often delayed or incomplete) and less prone to overfitting on small datasets. But on clean, well-labeled fraud datasets, XGBoost consistently edges it out by 1-3% accuracy. That is why XGBoost became the industry default.
- Best for: Fraud detection on noisy or incomplete labels, smaller datasets, and situations where model stability matters more than squeezing out maximum accuracy.
- Watch out for: Same structural limitation as XGBoost - cannot read connections between entities. Also more memory-heavy on large datasets and slightly less accurate than XGBoost on clean data.
Graph Neural Networks (GNNs)
A graph neural network models entities (accounts, devices, addresses, transactions) as nodes and their relationships as edges, then propagates fraud signals across those connections. GNNs catch organized fraud that flat-table models structurally cannot see: shared-device rings, synthetic identity networks, money mule chains, and coordinated account takeovers. Real-world fraud rings are typically 6-7 hops deep in entity graphs, and GNNs traverse these paths automatically rather than requiring manual feature engineering at each hop.
- Best for: Organized fraud and ring detection where the signal is in the connections between entities - synthetic identities, device-sharing clusters, money mule chains, coordinated account takeovers.
- Watch out for: Custom GNNs require graph infrastructure and longer training times (hours to days). Interpretability is moderate compared to tree-based models. KumoRFM eliminates the infrastructure burden by handling graph construction automatically from raw relational tables.
The org chart problem: why flat tables fail on fraud rings
Here is the simplest way to understand the limitation. Imagine your company's org chart - a tree of who reports to whom, which teams collaborate, which departments share resources. Now flatten that into a spreadsheet: one row per employee, columns for name, title, salary, and department.
You have lost the structure. You cannot tell who reports to whom. You cannot see which teams are connected. You cannot identify the VP whose entire department is underperforming. The list of names contains the same people, but the relationships that give the org chart its meaning are gone.
This is exactly what happens when you flatten fraud network data into a feature table for XGBoost. You start with a rich graph of connections - Account A shares a device with Account B, which shares a shipping address with Account C, which used the same phone number as Account D. You flatten it into per-transaction rows with aggregate columns like num_shared_devices = 2 and address_reuse_count = 3.
Those aggregate counts capture some signal. But they destroy the topology. XGBoost sees that an account has 2 shared devices. It cannot see that those 2 shared devices connect to 47 other accounts in a pattern that matches known fraud rings. The number tells you something. The graph tells you everything.
What fraud rings look like in practice
A fraud ring is not one bad actor with a stolen credit card. It is an organized operation where multiple accounts - sometimes hundreds - coordinate to exploit a system. Here are the patterns that GNNs catch and flat-table models miss:
- Synthetic identity rings. Fraudsters create fake identities by combining real Social Security numbers (often from children or deceased individuals) with fabricated names and addresses. They build credit over months, then "bust out" - maxing out all credit lines simultaneously. Each individual account looks legitimate. The ring pattern (shared address fragments, similar application timing, connected credit inquiries) is only visible in the graph.
- Device-sharing networks. Twenty accounts that all logged in from the same three devices within a 48-hour window. XGBoost sees each account individually and might flag the device fingerprint if it was manually engineered as a feature. A GNN sees the full cluster of 20 accounts connected through 3 device nodes and recognizes the coordinated pattern.
- Money mule chains. Stolen funds move through a chain of accounts: Account A sends to B, B splits to C and D, C and D send to E, E withdraws. Each individual transfer might be below reporting thresholds. The chain structure - visible only in the transaction graph - is the signal.
- Account takeover clusters. Compromised credentials are sold in batches. The accounts taken over in a single batch share behavioral fingerprints: similar login timing, same credential-testing patterns, similar first actions post-takeover. A GNN detects these temporal and behavioral clusters across the account graph.
- Return fraud rings. Groups that coordinate return abuse across multiple accounts and store locations, staying below individual detection thresholds. The graph reveals the shared addresses, payment methods, and timing patterns that connect them.
Why you cannot just engineer more features
The most common response from XGBoost-trained teams is: "We will just add graph-derived features to our flat table." Compute some graph metrics - degree centrality, PageRank, community detection scores - flatten them into columns, and feed them to XGBoost.
This helps. It typically adds 3-5% accuracy. But it has hard limits:
- Static snapshots. Graph metrics computed offline are stale by the time XGBoost uses them. Fraud rings evolve hourly. A pre-computed PageRank score from yesterday's batch run misses the ring that formed this morning.
- Fixed hop distance. Manual features capture 1-2 hops at most (direct neighbors, maybe neighbors of neighbors). Real fraud rings are 6-7 hops deep. Engineering features at that depth creates a combinatorial explosion that is not practical to maintain.
- Lost subgraph structure. Flattening a ring's topology into a single number (like a centrality score) loses the shape of the ring. Two accounts can have identical centrality scores but completely different fraud risk because of how their neighborhoods are structured. The GNN sees the structure. The aggregate column does not.
- Engineering cost. Each graph feature requires custom pipeline code: extract the graph, compute the metric, join it back to the transaction table, handle temporal windowing, maintain the pipeline as the graph schema changes. For teams already spending 12+ hours on feature engineering per task, adding graph features doubles the complexity.
The benchmark evidence
The SAP SALT benchmark tests prediction accuracy on real enterprise relational data - the kind of multi-table structure where fraud data actually lives. Here is how the approaches compare:
sap_salt_benchmark_fraud_relevant
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points. The gap comes from relational patterns that a flat feature table structurally cannot contain.
On the RelBench benchmark across 7 databases and 30 prediction tasks:
relbench_benchmark_results
| approach | AUROC | feature_engineering_time |
|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours per task |
| KumoRFM zero-shot | 76.71 | ~1 second |
| KumoRFM fine-tuned | 81.14 | Minutes |
KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.
A practical migration path: tabular first, graph when ready
This is not an all-or-nothing decision. Most fraud teams already have XGBoost models in production, and ripping them out is not realistic. The smart path is layered:
- Keep your XGBoost models running. They catch single-transaction fraud well. Do not break what works.
- Add graph-based scoring as a second layer. Use a GNN or relational foundation model to score entities based on their graph neighborhood. This catches the ring patterns your XGBoost models miss.
- Combine scores in your decisioning layer. Weight tabular and graph scores based on fraud type. Individual card fraud leans on XGBoost. Organized ring patterns lean on the graph score.
- Gradually shift to a unified model. KumoRFM 2.0 supports both single-table predictions (similar to what XGBoost does on tabular features) and multi-table relational predictions (graph-based). Over time, you can consolidate into a single platform that handles both fraud types without maintaining two separate pipelines.
XGBoost-only fraud detection
- Engineer tabular features: velocity, amount, time, merchant category (8-12 hours)
- Train XGBoost on single-transaction features
- Catches individually anomalous transactions
- Blind to fraud rings, shared-device networks, and coordinated attacks
- Manually add graph-derived features for partial ring detection (+3-5% accuracy)
- Maintain two pipelines: tabular features + graph features
KumoRFM fraud detection
- Connect to data warehouse - accounts, transactions, devices, addresses
- Write PQL: PREDICT is_fraud FOR EACH transactions.transaction_id
- Model reads all tables and discovers both tabular and relational fraud signals
- Catches single-transaction fraud AND fraud rings in one pass
- Zero feature engineering, zero graph construction, zero pipeline code
- One platform, one query, both fraud types
PQL Query
PREDICT is_fraud FOR EACH transactions.transaction_id WHERE transactions.amount > 50
One PQL query replaces the full fraud detection pipeline: tabular feature engineering, graph construction, model training, and scoring. KumoRFM reads raw accounts, transactions, devices, and address tables directly and discovers both single-transaction and ring-based fraud patterns.
Output
| transaction_id | fraud_prob_kumo | fraud_prob_xgboost | why_kumo_differs |
|---|---|---|---|
| TXN-8821 | 0.94 | 0.91 | Both flag - high amount, new merchant (tabular signal) |
| TXN-8822 | 0.88 | 0.23 | Kumo detects shared-device ring (7 accounts, 2 devices) |
| TXN-8823 | 0.91 | 0.18 | Kumo sees money mule chain (4 hops to known fraud account) |
| TXN-8824 | 0.05 | 0.41 | Kumo correctly lower - graph context shows legitimate business pattern |
Why KumoRFM handles both worlds
Most fraud teams face a bad choice: stick with XGBoost and miss rings, or build a custom GNN pipeline and deal with months of graph infrastructure work. KumoRFM removes this tradeoff.
KumoRFM is a relational foundation model. It reads raw relational tables - accounts, transactions, devices, addresses, merchants - and automatically constructs the heterogeneous graph that connects them. It then discovers predictive patterns across both individual entity features (what XGBoost sees) and multi-hop relational structure (what only a GNN can see).
The key advantage: you do not need to choose between tabular and graph. You do not need to build a graph database, write graph queries, or maintain graph infrastructure. You point KumoRFM at your existing data warehouse tables and write a PQL query. The model figures out which patterns - tabular, relational, or both - predict fraud for your specific data.
When to use each algorithm
Here is the direct guidance, based on fraud type:
algorithm_recommendation_by_fraud_type
| fraud_type | best_algorithm | why |
|---|---|---|
| Card-not-present fraud | XGBoost or KumoRFM | Strong tabular signals (amount, merchant, velocity). XGBoost handles well. KumoRFM adds device-graph context. |
| Account takeover (individual) | XGBoost or KumoRFM | Login anomalies, behavioral shifts. Tabular features capture most signal. |
| Account takeover (coordinated batch) | GNN or KumoRFM | Credential batches produce temporal clusters across account graph. Tabular models miss the coordination. |
| Synthetic identity fraud | GNN or KumoRFM | Shared SSN fragments, address similarities, application timing. Ring structure is the primary signal. |
| Money mule networks | GNN or KumoRFM | Chain-of-transfer patterns across 4-7 hops. Invisible to flat-table models. |
| Return fraud rings | GNN or KumoRFM | Coordinated returns across accounts, stores, and payment methods. Connection patterns are the signal. |
| First-party fraud | XGBoost or KumoRFM | Behavioral patterns of the account holder. Tabular features capture most signal. |
| Bust-out fraud | GNN or KumoRFM | Accounts that build credit and default together share hidden connections. Graph reveals the coordination. |
Recommendation by fraud type. XGBoost works for individual-level fraud. GNNs are required for organized and coordinated fraud. KumoRFM handles both in a single platform.