Enterprise fraud detection has changed more in the last three years than in the previous decade. The shift is not about faster rules or better dashboards. It is about a fundamental change in how fraud detection systems see your data.
Traditional tools - rules engines and flat-table machine learning - evaluate each transaction independently. They ask: "Is this single transaction suspicious?" They check amount, time of day, merchant category, and maybe a few aggregated features like "average_transaction_amount_30d."
But the fraud patterns causing the most damage in 2026 are not single-transaction anomalies. They are structural patterns that span multiple accounts, devices, and entities. Fraud rings. Multi-hop money laundering. Synthetic identities that share addresses, phone numbers, and device fingerprints with known bad actors. These patterns are only visible when you look at the connections between entities - the graph.
The headline result: SAP SALT benchmark
Before comparing individual tools, here is the result that matters most. The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.
sap_salt_enterprise_benchmark
| approach | accuracy | what_it_means |
|---|---|---|
| LLM + AutoML | 63% | Language model generates features, AutoML selects model |
| PhD Data Scientist + XGBoost | 75% | Expert spends weeks hand-crafting features, tunes XGBoost |
| KumoRFM (zero-shot) | 91% | No feature engineering, no training, reads relational tables directly |
SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.
KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.
The comparison at a glance
fraud_detection_tools_comparison_2026
| Tool | Approach | Detects Graph Patterns | Real-Time | Multi-Table Native | Explainability | Deployment | Best For |
|---|---|---|---|---|---|---|---|
| Kumo.ai | GNN on relational graph | Yes (multi-hop) | Yes | Yes | Graph-path explanations | 4-8 weeks | Fraud rings, laundering networks, complex multi-entity fraud |
| DataVisor | Unsupervised ML | Partial (clustering) | Yes | No | Cluster labels | 8-12 weeks | Account fraud, mass-registration attacks |
| Featurespace | Adaptive behavioral analytics | No | Yes | No | Behavioral profiles | 12-20 weeks | Payment fraud, real-time session anomalies |
| NICE Actimize | Rules + ML hybrid | No | Yes | No | Rule audit trails | 6-12 months | AML/KYC compliance, large bank regulatory requirements |
| Sardine | Device intelligence + behavioral biometrics | No | Yes | No | Device risk scores | 4-8 weeks | Account takeover, bot detection, onboarding fraud |
| Feedzai | Real-time AI + rules | Partial (link analysis) | Yes | No | Feature importance + rules | 8-16 weeks | Card fraud, payment processing, omnichannel |
| AWS SageMaker + Neptune | Build-your-own GNN | Yes (custom) | Depends on arch | Manual | Custom | 6-18 months | Teams with ML expertise wanting full control |
| DataRobot | AutoML on flat features | No | Yes (scoring) | No | Feature importance (SHAP) | 2-4 months | Teams with feature stores, tabular fraud signals |
Highlighted: Kumo.ai is the only tool that natively reads multi-table relational data as a graph and detects multi-hop fraud patterns without manual feature engineering.
Why graph structure matters for fraud
Before diving into individual tools, it is worth understanding why the "detects graph patterns" column in the table above matters more than any other.
Consider a simple laundering ring: Account A sends $4,800 to Account B. Account B sends $4,700 to Account C. Account C sends $4,600 to Account D. Account D sends $4,500 back to Account A. Each individual transaction is under $5,000 (below reporting thresholds), involves different account pairs, and looks completely normal in isolation.
A rules-based system sees four normal transactions. A flat-table ML model sees four normal transactions with normal feature values. A GNN sees a cycle in the transaction graph - money flowing in a ring with consistent decrements (the launderer's fee). The ring pattern is only visible when you see the connections.
what_each_approach_sees
| Transaction | Rules Engine | Flat-Table ML (XGBoost) | GNN (Graph View) |
|---|---|---|---|
| A -> B ($4,800) | Normal (under threshold) | Normal (avg features OK) | Sees A connected to B |
| B -> C ($4,700) | Normal (under threshold) | Normal (avg features OK) | Sees B connected to A and C |
| C -> D ($4,600) | Normal (under threshold) | Normal (avg features OK) | Sees emerging chain A-B-C-D |
| D -> A ($4,500) | Normal (under threshold) | Normal (avg features OK) | Detects cycle: A-B-C-D-A with consistent decrements |
Highlighted: only when the fourth transaction completes the cycle does the fraud pattern become visible - and only if the system sees the graph structure. Rules and flat-table ML never detect this pattern.
This is not a theoretical example. FinCEN reports that cyclical transaction patterns account for an estimated 30-40% of money laundering volume. Fraud rings - where multiple synthetic or compromised accounts collude - share devices, IP addresses, and beneficiaries in graph patterns that flat-table analysis cannot detect.
The false positive problem
Detection accuracy is only half the story. The other half is false positives - legitimate transactions incorrectly flagged as fraud. False positives are expensive: each one requires manual investigation (15-30 minutes of analyst time), damages customer experience, and can block legitimate revenue.
false_positive_rates_by_approach
| Approach | Typical False Positive Rate | Investigation Cost per 100K Transactions | Why |
|---|---|---|---|
| Rules-based | 95%+ | $450K-$600K | Static thresholds cannot adapt to individual behavior |
| Flat-table ML (XGBoost) | 70-80% | $200K-$300K | Better than rules but still sees each transaction independently |
| Behavioral analytics | 60-75% | $150K-$250K | Understands session context but not entity relationships |
| GNN-based | 50-60% | $100K-$150K | Understands full entity context: who transacts with whom, shared devices, relationship history |
Highlighted: GNN-based approaches cut false positive rates roughly in half compared to flat-table ML by incorporating relational context. Each percentage point reduction saves investigation costs and improves customer experience.
The reason GNNs reduce false positives is straightforward: context. When a rules engine flags a $9,500 wire transfer, it does not know that the sender regularly sends similar amounts to this recipient, that both accounts have been active for 8 years, and that the recipient is a verified vendor. The GNN knows all of this because it reads the full relationship graph. Legitimate transactions that look suspicious in isolation look normal in context.
Tool-by-tool analysis
1. Kumo.ai - GNN-based relational fraud detection
Kumo.ai represents your transaction data as a temporal heterogeneous graph - accounts, transactions, devices, IP addresses, and merchants become nodes, and relationships between them become edges. A graph neural network processes this structure to detect fraud patterns that span multiple entities and time windows.
The key differentiator is that Kumo reads relational tables directly. You do not need to pre-build a flat feature table or manually engineer graph features. You point PQL (Predictive Query Language) at your existing database tables, and the GNN discovers which relational patterns are predictive of fraud.
PQL Query
PREDICT is_fraud FOR EACH transactions.transaction_id WHERE transactions.timestamp > '2026-03-01'
One query replaces the entire fraud feature engineering pipeline. The GNN reads your transactions, accounts, devices, and merchant tables directly - discovering multi-hop patterns like fraud rings, shared-device clusters, and laundering cycles without manual feature engineering.
Output
| transaction_id | fraud_probability | top_signal | XGBoost_baseline |
|---|---|---|---|
| TXN-88412 | 0.94 | 3-hop cycle detected (A-B-C-A) | 0.31 |
| TXN-88413 | 0.07 | Established sender-receiver pair | 0.42 |
| TXN-88414 | 0.88 | Shared device with flagged cluster | 0.28 |
| TXN-88415 | 0.03 | Normal pattern for merchant category | 0.05 |
On fraud-adjacent RelBench benchmarks, Kumo's GNN achieves 0.89 recall compared to 0.81 for XGBoost with manually engineered features. The 8-point recall gap translates directly to fraud caught: for every 100 fraudulent transactions, the GNN catches 89 while XGBoost catches 81. At enterprise scale (millions of transactions per day), those 8 extra catches per 100 represent significant prevented losses.
Strengths: Multi-hop fraud ring detection, no feature engineering required, reads relational data natively, graph-path explanations for investigators, 4-8 week deployment.
Considerations: Requires relational data (transaction tables with foreign keys). Maximum value on data with rich entity relationships. Newer entrant compared to legacy compliance platforms.
2. DataVisor - unsupervised ML for fraud
DataVisor uses unsupervised machine learning to detect fraud patterns without labeled training data. Its core approach clusters accounts and transactions to identify groups of entities behaving similarly in suspicious ways - useful for catching coordinated attacks like mass-registration fraud and bot-driven account abuse.
The unsupervised approach is valuable when you lack labeled fraud data (a common problem for new fraud vectors). DataVisor can identify suspicious clusters before any known fraud has been confirmed.
Strengths: Does not require labeled fraud data, detects coordinated attacks, strong on account-level fraud, real-time scoring.
Limitations: Clustering is not the same as graph analysis - it groups similar entities but does not trace multi-hop connection paths. Requires flat feature input. Less effective on transaction-level fraud where the pattern is structural (rings, chains) rather than behavioral (similar accounts acting similarly).
3. Featurespace - adaptive behavioral analytics (ARIC)
Featurespace's ARIC platform builds adaptive behavioral profiles for each entity (cardholder, account, device) and detects anomalies in real time. It learns what "normal" looks like for each entity and flags deviations. The system adapts continuously as behavior changes.
ARIC is particularly strong on payment fraud - card-not-present transactions, real-time authorization decisions, and session-level anomaly detection. Its behavioral approach catches fraud that static rules miss (a transaction that is unusual for this specific customer, even if it looks normal in aggregate).
Strengths: Real-time adaptive scoring, strong behavioral baselines, well-established in payment processing, good for card fraud and authorization decisions.
Limitations: Does not analyze graph structure between entities. Each entity is profiled independently. Cannot detect fraud rings where individual entity behavior is normal but the connection pattern is suspicious. Longer deployment (12-20 weeks to build behavioral baselines).
4. NICE Actimize - rules + ML hybrid for compliance
NICE Actimize is the legacy incumbent in financial crime compliance. Its platform covers AML, KYC, fraud detection, and sanctions screening with a rules engine augmented by machine learning. Most of the largest global banks use Actimize for regulatory compliance.
The platform's strength is its comprehensive regulatory coverage and audit trails. When a regulator asks "why was this transaction flagged?" Actimize provides a clear rule-based explanation. Its ML layer improves accuracy over pure rules but operates on flat feature inputs.
Strengths: Regulatory compliance coverage, established at major banks, comprehensive audit trails, broad financial crime coverage (AML + KYC + fraud).
Limitations: Rules-based core produces high false positive rates (95%+). ML augmentation helps but still requires flat feature engineering. Long deployment cycles (6-12 months). High implementation and licensing costs. Does not detect graph patterns.
5. Sardine - device intelligence + behavioral biometrics
Sardine is a newer entrant focused on device intelligence and behavioral biometrics. It analyzes how users interact with devices - typing patterns, mouse movements, screen pressure, device fingerprints - to detect account takeover, bot activity, and onboarding fraud.
Sardine's approach is complementary to transaction-level fraud detection. It answers a different question: "Is the person interacting with this device who they claim to be?" rather than "Is this transaction fraudulent?"
Strengths: Strong device fingerprinting, behavioral biometrics for account takeover, fast deployment (4-8 weeks), effective for onboarding fraud and bot detection, modern API-first architecture.
Limitations: Focused on device/session layer, not transaction pattern analysis. Cannot detect fraud rings or laundering patterns. Best as a complementary layer rather than a primary fraud detection system for financial crime.
6. Feedzai - real-time AI for financial crime
Feedzai offers a real-time AI platform for financial crime detection, combining machine learning with a rules engine and some link analysis capabilities. It serves large banks and payment processors with real-time scoring at high transaction volumes.
Feedzai's link analysis provides partial graph awareness - it can identify direct connections between entities (shared devices, shared addresses) but does not perform the deep multi-hop graph traversal that catches complex fraud rings. It sits between flat-table ML and full graph analysis.
Strengths: Real-time scoring at scale, combines ML + rules + link analysis, strong in payment processing and card fraud, established with large financial institutions.
Limitations: Link analysis is not full graph ML - it finds direct connections but not multi-hop patterns (3+ hops). Requires feature engineering for the ML component. Mid-range deployment timeline (8-16 weeks).
7. AWS SageMaker + Neptune - build-your-own GNN
AWS offers the building blocks for a custom graph-based fraud detection system: Amazon Neptune (graph database), SageMaker (ML training and hosting), and Deep Graph Library (DGL) for GNN training. This approach gives maximum flexibility but requires a dedicated ML engineering team.
Teams that choose this path get full control over model architecture, training data, and inference pipeline. The trade-off is that everything - data pipeline, graph construction, GNN architecture, feature engineering, model training, deployment, monitoring - must be built and maintained internally.
Strengths: Full architectural control, no vendor lock-in, can be customized for specific fraud patterns, scales with AWS infrastructure, cost-effective at very large scale.
Limitations: Requires 2-4 ML engineers dedicated to fraud. 6-18 month build timeline. Must build and maintain the entire pipeline (graph construction, GNN training, feature engineering, deployment, monitoring). No pre-built fraud-specific patterns or benchmarks. Ongoing maintenance burden.
8. DataRobot - AutoML for fraud
DataRobot applies its AutoML platform to fraud detection: upload a flat feature table, and it automatically selects the best model architecture, tunes hyperparameters, and builds ensembles. It streamlines the modeling step of the fraud detection pipeline.
DataRobot works well when you already have a mature feature engineering pipeline producing a flat table with fraud-relevant features. It reduces the modeling work from weeks to hours. But it inherits the fundamental limitation of flat-table ML: it cannot see graph structure.
Strengths: Fast model building on existing feature tables, SHAP-based explainability, good model governance and monitoring, easy for teams without deep ML expertise.
Limitations: Requires a pre-built flat feature table. Cannot read relational data directly. Cannot detect graph patterns (fraud rings, shared-device clusters). All the feature engineering for fraud - transaction velocities, entity aggregations, network features - must be done manually before DataRobot sees the data.
How to choose: four questions
The right tool depends on your specific fraud landscape, data infrastructure, and team. Four questions cut through the noise:
1. What fraud patterns are you trying to catch?
If your primary fraud vector is individual transaction anomalies (unusual amounts, merchants, times), behavioral analytics (Featurespace) or real-time AI (Feedzai) will serve you well. If your primary vector is organized fraud - rings, laundering networks, synthetic identity clusters - you need graph-level analysis (Kumo.ai, or build-your-own with AWS). Most enterprises face both, which is why a layered approach is common.
2. Do you have a feature engineering team?
DataRobot and flat-table ML tools require someone to build and maintain the feature pipeline. If you have a data science team already producing fraud feature tables, these tools add value on top. If you do not, or if your team is stretched thin, a tool that reads relational data directly (Kumo.ai) eliminates that bottleneck.
3. How important is regulatory compliance?
If regulatory audit trails and compliance coverage are the primary driver (not just fraud detection accuracy), NICE Actimize has the deepest regulatory footprint. Many banks use Actimize for compliance and layer a more accurate ML tool on top for detection.
4. What is your deployment timeline?
If you need results in weeks, Sardine (device intelligence) and Kumo.ai (GNN) deploy fastest. If you can invest 6-18 months and have the ML team, building on AWS gives maximum long-term flexibility.
Flat-table approach (rules, XGBoost, AutoML)
- Evaluates each transaction independently
- Requires manual feature engineering for every fraud signal
- Cannot detect fraud rings or multi-hop laundering
- 70-95% false positive rates depending on rules vs ML
- Misses structural fraud patterns that account for 30-40% of losses
Graph-based approach (GNN)
- Sees the full entity relationship graph
- Discovers fraud-predictive patterns from raw relational data
- Detects rings, chains, shared-device clusters, and laundering cycles
- 50-60% false positive rates with relational context
- Catches organized fraud invisible to single-transaction analysis
The recall gap: what 8 points means at scale
On fraud benchmarks, Kumo's GNN achieves 0.89 recall versus 0.81 for XGBoost with manually engineered features. An 8-point recall gap sounds modest in the abstract. At enterprise scale, it is not.
recall_gap_financial_impact
| Metric | XGBoost (0.81 recall) | GNN (0.89 recall) | Difference |
|---|---|---|---|
| Fraudulent transactions per year | 10,000 | 10,000 | - |
| Fraud caught | 8,100 | 8,900 | +800 additional catches |
| Fraud missed | 1,900 | 1,100 | 42% fewer misses |
| Avg loss per missed fraud | $2,500 | $2,500 | - |
| Annual loss from missed fraud | $4.75M | $2.75M | $2M saved annually |
Highlighted: 8 points of recall translates to $2M in prevented annual fraud losses at 10,000 fraudulent transactions per year. At larger scale (100K+ fraud events), the impact is proportionally larger.
The recall advantage compounds with the false positive reduction. More fraud caught (higher recall) plus fewer false alerts (lower false positive rate) means the investigation team spends more of its time on real fraud and less on false alarms. This is the dual benefit of understanding the transaction graph.
Deployment architecture: where each tool fits
In practice, enterprise fraud detection is rarely a single tool. Most large financial institutions deploy a layered architecture:
- Layer 1: Rules engine (NICE Actimize or similar) for regulatory compliance, sanctions screening, and known fraud patterns. This layer catches obvious violations and satisfies regulatory requirements.
- Layer 2: Real-time behavioral/session analysis (Featurespace, Sardine, or Feedzai) for session-level anomalies, device intelligence, and real-time authorization decisions. This layer catches individual transaction and session anomalies.
- Layer 3: Graph-based ML (Kumo.ai or custom AWS) for detecting structural fraud patterns - rings, laundering networks, synthetic identity clusters. This layer catches the organized fraud that layers 1 and 2 miss.
Each layer addresses a different fraud vector. Rules catch known patterns. Behavioral analytics catch anomalous sessions. Graph ML catches structural patterns. The combination is stronger than any single tool.