A major bank's fraud detection system flagged 10,000 transactions per day. Their fraud analysts investigated every one. The catch rate? 3%. That means 9,700 alerts per day were false alarms, costing $50 per investigation, burning out their team, and training everyone to ignore the alerts. Meanwhile, a $2M fraud ring operated for 8 months undetected because each individual transaction in the ring looked perfectly normal.
This is not a cautionary tale from 2005. This is the current state of fraud detection at most financial institutions. The systems flag too much, catch too little, and miss the sophisticated fraud entirely.
The problem is not effort. Banks spend billions on fraud prevention. The problem is architecture. Most fraud detection systems evaluate each transaction independently, like trying to spot a conspiracy by reading one text message at a time. The individual messages look fine. The conspiracy is in the connections between them.
This guide covers the full landscape: the three eras of fraud detection, every algorithm worth considering (with honest assessments of each), the metrics that matter when analyst capacity is your bottleneck, 8 concrete methods to improve accuracy, and the architectural shift from transaction-level to graph-level detection that catches what traditional systems cannot see.
The 3 eras of fraud detection
Fraud detection has evolved through three distinct phases. Each one was a genuine improvement over the last. Most organizations are stuck in Era 2 while fraudsters have already moved to patterns that require Era 3 thinking.
Era 1: Rules (1990s to present)
If transaction amount exceeds $10,000, flag it. If the card is used in two countries within an hour, flag it. If the merchant category is on the high-risk list, flag it. Rules are explicit, transparent, and easy to explain to regulators. They are also the reason your bank calls you every time you buy something on vacation.
Rules work for known, static fraud patterns. The problem is that fraudsters read the same rules you do. A $10,000 threshold means they run transactions at $9,999. A velocity check of 5 transactions per hour means they run 4. Every rule you publish is a playbook for how to avoid detection.
The deeper problem is combinatorial explosion. A single rule is simple. Ten thousand rules interacting with each other in production is a system nobody fully understands. Banks accumulate rules over decades. Nobody removes old ones because nobody knows which ones are still catching fraud and which ones are just generating false positives. The result: 90-95% false alarm rates and a team of analysts who have learned that most alerts are noise.
Era 2: ML on flat tables (2010s to present)
Take every transaction, compute features (amount, time of day, merchant category, days since last transaction, average spend for this customer), flatten it all into a single row, and feed it to XGBoost. This was a genuine leap. Instead of hand-coding thresholds, the model learns the patterns from labeled data. False positive rates dropped from 95% to 50-70%. Catch rates improved from 40-60% to 70-80%.
But Era 2 has a structural limitation: it evaluates each transaction in isolation. The model sees a row of numbers. It does not see that this transaction is one link in a chain of 15 transfers that form a circle. It does not see that the receiving account shares a device fingerprint with 4 accounts flagged for fraud last month. It does not see that the merchant has received transfers from 30 newly created accounts in the last week.
Era 2 catches the clumsy fraud. The stolen credit card used for a $5,000 purchase at 3 AM in a country the cardholder has never visited. That is a bright red dot on a flat table. But organized fraud, coordinated networks where each individual action looks legitimate? That requires seeing the connections.
Era 3: Graph ML (2020s and emerging)
Instead of flattening everything into one row per transaction, graph ML keeps the full network structure. Accounts, devices, IP addresses, merchants, and beneficiaries are nodes. Transactions, logins, and shared attributes are edges. The model learns on the topology, not just the features.
This is not an incremental improvement. It is a category change. Graph ML sees fraud rings (circular money flows), money mule networks (accounts that receive and rapidly forward funds), synthetic identity clusters (fake identities sharing real attributes), and coordinated attacks (many accounts hitting the same target in a pattern). These patterns are mathematically invisible to transaction-level models. They only exist in the relationships between entities.
three_eras_of_fraud_detection
| era | approach | false_positive_rate | fraud_catch_rate | what_it_misses |
|---|---|---|---|---|
| Era 1: Rules | Hand-coded thresholds and velocity checks | 90-95% | 40-60% | Any pattern the rule writer did not anticipate. Adaptive fraudsters. |
| Era 2: ML on flat tables | XGBoost/LightGBM on transaction features | 50-70% | 70-80% | Fraud rings, money mule networks, synthetic identity clusters. Anything requiring entity relationships. |
| Era 3: Graph ML | GNNs on entity-relationship networks | 30-50% | 82-92% | Completely novel fraud types with no historical pattern. Still needs rules as a first layer. |
Each era represents a genuine improvement. Most banks are in Era 2. The fraud they are missing lives in Era 3.
The 6 fraud detection algorithms, honestly compared
Every vendor will tell you their algorithm is the best. Here is what each one actually does well, what it does not, and when you should care.
fraud_detection_algorithms_compared
| algorithm | the_honest_take | typical_recall | best_for | honest_limitation |
|---|---|---|---|---|
| Rules / Heuristics | Your grandma's fraud detection. Still catches 40% of fraud. Not going anywhere. | 40-60% | Known patterns, regulatory requirements, instant decisions under 1ms | 95% false positive rate. Every rule you add makes the system harder to maintain and easier for fraudsters to reverse-engineer. |
| Logistic Regression | The baseline that embarrassingly often beats fancy models. If you cannot beat logistic regression, your features are the problem, not your algorithm. | 55-70% | Regulated environments where every coefficient must be explainable. Quick baselines. | Cannot capture non-linear interactions without manual feature engineering. Misses complex fraud patterns. |
| Random Forest | Good enough for v1, replaced by XGBoost in v2. Nobody regrets starting here, but nobody stays. | 65-75% | First ML model when you need something fast and interpretable enough for stakeholders | Slower inference than boosted trees. Typically 3-5 recall points behind XGBoost on the same features. |
| XGBoost / LightGBM | The production standard. Most banks run this. If you can only pick one algorithm for a flat feature table, pick this. | 70-82% | Production fraud scoring on transaction features. The default choice for Era 2. | Evaluates each transaction independently. Cannot see network patterns. Blind to fraud rings. |
| Neural Networks (Autoencoders) | Good for anomaly detection when you do not have labeled fraud data. Learns what 'normal' looks like and flags deviations. | 60-75% | New fraud types with no historical labels. Detecting unknown-unknowns. | Higher false positive rate than supervised models. 'Anomalous' does not mean 'fraudulent.' A first-time luxury purchase is anomalous but legitimate. |
| Graph Neural Networks | The only approach that sees fraud RINGS, not just fraud TRANSACTIONS. Categorically different, not just incrementally better. | 82-92% | Fraud rings, money mule detection, synthetic identity clusters, any pattern requiring entity relationships | Requires graph-structured data. Higher computational cost. Harder to explain individual decisions without path-based attribution. |
Highlighted: XGBoost is the current production standard for transaction-level fraud. GNNs achieve higher recall by reading network topology that flat-table models cannot access.
Notice that the recall ranges overlap. A well-featured XGBoost model can beat a poorly-constructed GNN. The algorithm matters, but the data architecture matters more. The question is not "which algorithm is best?" but "what structure does your data need to be in for the algorithm to see the patterns?"
Fraud detection metrics (different from every other ML problem)
Fraud metrics are not churn metrics. In churn prediction, you might be able to call 500 at-risk customers this month. In fraud detection, you might be able to investigate 100 alerts per day, and each investigation costs $50 and takes 45 minutes. The bottleneck is not the model. It is the human on the other end.
fraud_detection_metrics
| metric | what_it_measures | the_analogy | when_to_use_it | watch_out_for |
|---|---|---|---|---|
| Precision at K | Of the top K alerts, how many are real fraud? | If you can only open 100 cases today, how many will be worth your time? | When analyst capacity is fixed and you need to maximize value per investigation | Ignores fraud below the cutoff. You might have great precision at 100 but miss 500 real fraud cases. |
| Recall | Of all actual fraud, how much did the model catch? | How much fraud slipped through while you were investigating the alerts you did catch? | When a missed fraud case costs $500K+ and an investigation costs $50 | You can get 100% recall by flagging every transaction. Recall without precision is useless. |
| False Positive Rate | Of legitimate transactions, how many were wrongly flagged? | The metric that determines whether your analysts trust the system or ignore it. | Always track this. It is the leading indicator of analyst burnout and alert fatigue. | A 1% FPR sounds low until you realize that on 1M daily transactions, that is 10,000 false alarms. |
| $ Saved vs. $ Investigated | Total fraud dollars caught divided by total investigation cost | Your fraud team's return on investment, expressed as a ratio. | Executive reporting. Justifying headcount and tool spend. | Can be gamed by only investigating high-dollar cases and ignoring small-dollar fraud that adds up. |
| AUC-ROC | How well the model ranks fraudulent transactions above legitimate ones across all thresholds | Your model's overall discrimination ability. 50% is random. 90%+ is strong. | Comparing models during development. General-purpose evaluation. | Flatters your model on extremely imbalanced data (99.9% legitimate). Use PR-AUC alongside. |
| PR-AUC | Precision-Recall tradeoff across all thresholds, focused on the fraud class | The honest metric. Ignores the easy 'not fraud' predictions entirely. | Model selection on highly imbalanced fraud data. The metric that does not lie. | Harder to interpret. A 'good' PR-AUC depends heavily on the base fraud rate. |
Precision at K maps most directly to operational reality. False positive rate determines analyst trust. PR-AUC is the most honest comparison metric.
In fraud, precision is how much you trust the alarm. Recall is whether you sleep at night. You need both, but the balance depends on your specific economics.
Choosing the right metric for your fraud operation
which_fraud_metric_for_which_scenario
| your_situation | optimize_for | why | threshold_strategy |
|---|---|---|---|
| Small fraud team (5-10 analysts), high transaction volume | Precision at K | Every false alarm wastes 45 minutes of scarce analyst time. Make each investigation count. | Set K to your daily investigation capacity. Optimize model to maximize precision at that K. |
| High-value transactions (wire transfers, ACH) | Recall | A single missed wire fraud can cost $500K-$5M. The investigation cost is trivial by comparison. | Low threshold. Flag aggressively. Hire more analysts if needed. |
| Card-not-present e-commerce fraud | F1 or balanced precision/recall | Average fraud is $100-500. Investigation cost is $50. You need balance, not extremes. | Medium threshold. Target 30-40% precision with 75%+ recall. |
| Reporting to the board / regulators | $ Saved vs. $ Investigated + Recall | Board cares about ROI. Regulators care about fraud you missed. | Report both. ROI for the CFO. Recall for the compliance team. |
There is no single best metric. The right choice depends on your investigation capacity, average fraud value, and regulatory requirements.
8 methods to improve fraud detection accuracy
These are ordered from quickest wins to the most transformative changes. Methods 1-7 work within the transaction-level paradigm. Method 8 changes the paradigm entirely.
1. Feature velocity (transactions per hour, not just amount)
A $200 purchase is normal. Five $200 purchases in 10 minutes is not. Static features like transaction amount miss the temporal dimension entirely. Velocity features capture it: transactions per hour, distinct merchants per day, total spend in the last 60 minutes, number of failed attempts in the last 30 minutes.
Card testing attacks are the textbook example. A fraudster with a stolen card number runs small transactions ($1-5) at multiple merchants in rapid succession to test which cards are live. Each transaction looks innocent. The velocity is the signal. Compute txn_count_last_1h, txn_count_last_24h, distinct_merchants_last_1h, and failed_txn_count_last_30m at minimum.
Typical improvement: 5-10 recall points over static features alone. This is usually the single biggest jump from a single feature category. If you are not computing velocity features, start here.
2. Time-of-day and day-of-week patterns
Legitimate customers have patterns. They buy coffee at 7 AM, gas at 5 PM, groceries on Saturday. Fraud does not follow these patterns because the fraudster does not know the cardholder's routine. A transaction at 3 AM on a Tuesday from a customer who has never transacted after 10 PM is a signal, not in isolation, but combined with other features.
Compute the deviation from the customer's historical time pattern: hour_deviation_from_avg and is_unusual_day_of_week. Also compute global risk by time slot: fraud rates are 2-3x higher between 1 AM and 5 AM across most datasets.
Typical improvement: 2-4 recall points. Modest but essentially free to implement.
3. Merchant category risk scoring
Not all merchants are equal. Gas stations, online gambling, and cryptocurrency exchanges have fraud rates 5-10x higher than grocery stores and utilities. Compute a merchant-category fraud rate from your historical data and use it as a feature. Better yet, compute a merchant-level risk score that updates weekly based on recent fraud reports against that specific merchant.
The nuance: do not hardcode merchant categories as "high risk" based on intuition. Compute it from data. Some "high risk" categories in your portfolio might have low fraud rates because your existing rules already over-monitor them, while "low risk" categories might be where fraud is actually hiding.
Typical improvement: 1-3 recall points. More valuable for reducing false positives than for catching new fraud.
4. Device and IP fingerprinting
The same device used across multiple accounts is a red flag. An IP address associated with a known proxy or VPN service is a signal. A device fingerprint that has never been seen before on a high-value transaction is suspicious. Device intelligence adds an entirely different dimension of signal that transaction features alone cannot capture.
Key features: device_accounts_count (how many accounts have used this device), ip_risk_score (VPN, proxy, or datacenter IP), is_new_device_for_customer, and device_fraud_history_count (fraud cases associated with this device in the last 90 days).
Typical improvement: 3-7 recall points. Among the highest-value feature categories for card-not-present fraud.
5. Network features (shared addresses, phones, devices)
This is where we start crossing from Era 2 into Era 3 territory. Two accounts sharing the same phone number, email domain, physical address, or device fingerprint creates an implicit network. Even without a full graph model, you can compute network-derived features: accounts_sharing_this_device, fraud_rate_of_connected_accounts, avg_account_age_of_network.
Synthetic identity fraud is the use case that makes this essential. Fraudsters create fake identities using combinations of real and fabricated data. Each identity looks legitimate in isolation. But they share attributes: the same phone number on 5 "different" people, the same mailing address, the same device fingerprint. The network reveals the cluster.
Typical improvement: 3-8 recall points. The improvement is dramatic for synthetic identity and account takeover fraud.
6. Anomaly scores as features
Train an autoencoder or isolation forest on legitimate transactions. Compute the reconstruction error or anomaly score for each transaction. Feed that score as a feature into your supervised model. This gives XGBoost a "weirdness detector" that captures novel fraud patterns the supervised model has never seen in its training labels.
The trick: the anomaly model should be trained only on confirmed legitimate transactions, not on the full dataset. This makes the anomaly score a measure of "how different is this from known good behavior" rather than "how different is this from average behavior."
Typical improvement: 1-4 recall points. Most valuable for catching new fraud types that are not represented in your historical labels.
7. Ensemble stacking (rules + ML + anomaly)
The best production systems are not one model. They are three layers working together. Layer 1: rules catch the known, obvious patterns in under 1 millisecond. Layer 2: XGBoost scores every transaction that passes the rules layer, using the full feature set. Layer 3: anomaly detection catches the novel patterns that neither rules nor supervised ML have seen before.
Feed the outputs of all three layers into a meta-learner (usually logistic regression) that learns when to trust each component. The rules layer catches card testing attacks instantly. The XGBoost layer catches complex but known patterns. The anomaly layer catches the new attack vector that appeared last Tuesday.
Typical improvement: 2-5 recall points over the best single model, with lower false positive rate than any individual component.
8. Graph features (connected accounts, transaction flow patterns)
Methods 1-7 look at each transaction through a microscope. Method 8 steps back and looks at the whole crime scene.
Instead of computing features for a single transaction, compute features across the entire network of entities connected to that transaction. The account, the device, the IP, the merchant, the beneficiary, and every other entity connected to any of those nodes. How many of those connected entities have fraud history? What is the average age of accounts in this cluster? Is there a circular flow pattern? Are funds being received and forwarded rapidly (the money mule signature)?
Full graph neural networks take this further by learning the features automatically through message passing across the network. Instead of hand-engineering graph features, the GNN discovers which network patterns predict fraud by propagating information along edges.
Typical improvement: 8-15 recall points over transaction-level features. This is not incremental. This is a step function, especially for organized fraud, rings, mule networks, and synthetic identity clusters.
The graph advantage: seeing the crime scene, not just the evidence
Traditional fraud detection is like trying to solve a conspiracy by reading individual text messages. Graph-based detection reads the entire conversation, across all participants, in order. Here is what that means in practice.
The fraud ring example
Account A sends $2,000 to Account B. B sends $1,800 to Account C. C sends $1,600 to Account D. D sends $1,400 back to Account A. Each transaction is below the $10,000 reporting threshold. Each amount is different (no round numbers to trigger rules). Each transfer has a plausible description ("freelance payment," "rent share," "equipment purchase"). The accounts have legitimate history and normal activity patterns.
A transaction-level model scores each transfer independently. Score: low risk, low risk, low risk, low risk. Four green lights. The $7K that just went through a laundering cycle is invisible.
A graph model sees the topology. A to B to C to D to A. A cycle. With decreasing amounts at each hop (the "service fee" skimmed by each mule). The pattern is textbook. The graph model flags the ring, not because any single transaction is suspicious, but because the structure is.
Money mule detection
Money mules are accounts that receive funds from multiple sources and rapidly forward them to other accounts, taking a small cut. They are the plumbing of organized financial crime. At the transaction level, each deposit and withdrawal looks like normal banking. At the graph level, the pattern is obvious: high in-degree (many senders), high out-degree (many recipients), short time between receiving and forwarding, and connections to known bad accounts.
A mule account might have 20 incoming transfers from 15 different accounts, with 80% of the funds forwarded within 4 hours to 3 accounts. That fan-in/fan-out pattern with rapid forwarding is a strong structural signal that no amount of transaction-level feature engineering will capture.
The benchmark: GNN vs. XGBoost on fraud data
gnn_vs_xgboost_fraud_benchmark
| approach | recall | false_positive_rate | what_it_captures |
|---|---|---|---|
| XGBoost on transaction features | 0.81 | ~2.5% | Individual transaction anomalies. Stolen cards, unusual amounts, velocity spikes. |
| GNN on entity-relationship graph | 0.89 | ~1.8% | All of the above PLUS fraud rings, money mule networks, synthetic identity clusters, coordinated attacks. |
The GNN achieves 0.89 recall vs. XGBoost's 0.81, an 8-point improvement. The gap is widest on organized fraud that is invisible at the transaction level.
That 8-point recall gap translates directly to dollars. On a portfolio with $10M in annual fraud losses, 8 additional recall points means catching $800K in fraud that the transaction-level model misses entirely. And the GNN does this while simultaneously reducing the false positive rate, because it has richer signals to distinguish real fraud from legitimate-but-unusual transactions.
PQL for fraud detection
PQL Query
PREDICT is_fraud_7d FOR EACH transactions.transaction_id WHERE transactions.amount > 50 AND transactions.timestamp > now() - 30d
This query predicts 7-day fraud probability for recent transactions over $50. The model reads the full entity graph (accounts, devices, merchants, IPs) and computes graph-derived features automatically, including circular flow detection and connected-entity fraud history.
Output
| transaction_id | fraud_probability | top_driver | recommended_action |
|---|---|---|---|
| TXN-88201 | 0.94 | Part of 4-account circular transfer pattern | Block + escalate to fraud ring investigation |
| TXN-88202 | 0.87 | Device shared with 3 accounts flagged in last 30 days | Hold transaction, verify identity |
| TXN-88203 | 0.62 | Merchant received 12 first-time customer txns in 1 hour | Enhanced monitoring on merchant |
| TXN-88204 | 0.08 | Normal pattern for this customer and merchant | Approve (no action) |
Transaction-level fraud detection
- One row per transaction with computed features
- Requires manual feature engineering (velocity, amount stats, time patterns)
- Cannot see fraud rings or circular money flows
- Cannot detect synthetic identity clusters sharing attributes
- Typical recall: 70-82% with high false positive rates
Graph-based fraud detection
- Reads the full entity-relationship network directly
- Learns graph features automatically through message passing
- Detects fraud rings, circular flows, and coordinated attacks natively
- Identifies synthetic identity clusters through shared-attribute topology
- Typical recall: 82-92% with lower false positive rates
Fraud detection tools: an honest comparison
The right tool depends on your fraud type, transaction volume, regulatory requirements, and team. A $50M fintech and a $500B bank have very different needs. Here is the honest breakdown.
fraud_detection_tools_compared
| tool | type | best_for | honest_limitation |
|---|---|---|---|
| Rules engines (in-house) | Rule-based system | Known patterns, regulatory requirements, instant decisions. Every fraud system needs a rules layer. | 95% false positive rate. 10,000 rules accumulated over a decade that nobody fully understands. Fraudsters reverse-engineer your thresholds. |
| XGBoost / LightGBM pipelines | Open-source ML | Production transaction-level scoring. The accuracy standard for flat feature tables. Full control. | You build and maintain everything: feature pipelines, model training, monitoring, deployment. Requires a data science team. |
| DataVisor | Unsupervised fraud detection | Detecting unknown fraud patterns and coordinated attacks without labeled data. | Higher false positive rate than supervised models. Works best as a complement to supervised systems, not a replacement. |
| Featurespace (ARIC) | Adaptive behavioral analytics | Card payment fraud with real-time adaptive models. Strong in banking and payments. | Primarily focused on payment fraud. Less suited for insurance, lending, or non-payment fraud types. |
| NICE Actimize | Enterprise fraud and AML platform | Large banks needing integrated fraud and anti-money laundering. Regulatory compliance out of the box. | Enterprise pricing and implementation timelines. 6-12 month deployments. Heavy platform, not lightweight. |
| Sardine | Device intelligence + ML | Fintech and neobanks. Strong device fingerprinting, behavioral biometrics, and mule detection. | Newer entrant with less enterprise track record. Best for digital-first businesses, less proven for branch-based banking. |
| AWS Fraud Detector / Neptune GNN | Cloud-native ML + graph | AWS-native organizations wanting managed fraud ML. Neptune adds graph capability for network analysis. | Vendor lock-in to AWS. Neptune GNN requires graph data modeling expertise. The managed ML layer is less customizable than building your own. |
| Kumo.ai | Relational foundation model | Multi-table fraud detection without feature engineering. Reads entity-relationship graphs natively. Catches fraud rings and network patterns. | Requires relational/graph data. If your data is already a single clean transaction table, XGBoost pipelines are simpler to start with. |
Highlighted: Kumo.ai reads relational entity graphs natively, catching fraud rings and network patterns without manual graph feature engineering. For transaction-only data, XGBoost remains the pragmatic starting point.
Picking the right tool for your fraud operation
- Startup / early-stage fintech, limited fraud data: Sardine for device intelligence and behavioral signals. Rules for known patterns. You need external signals when your own fraud history is thin.
- Mid-size bank, established fraud team, flat-table data: XGBoost pipelines for maximum control. Featurespace or DataVisor if you want a managed platform.
- Large bank, multi-entity data, organized fraud problem: Kumo.ai reads your entity-relationship data directly and catches ring patterns, mule networks, and synthetic identity clusters that transaction-level tools miss.
- Regulatory-first, AML + fraud integrated: NICE Actimize for the compliance framework. Layer ML on top for accuracy.
The 6 deadly sins of fraud detection
These mistakes are systemic. They exist at banks, fintechs, and insurance companies right now. Each one looks reasonable from the inside and devastating from the outside.
1. The False Positive Factory
A 95% false positive rate means your fraud team investigates 19 legitimate transactions for every 1 real fraud case. At $50 per investigation, a system that flags 10,000 transactions per day at 95% FPR costs $475,000 per day in wasted analyst time. But the real cost is worse: alert fatigue. After the 15th false alarm in a row, your analysts start rubber-stamping alerts. They stop reading the details. They clear cases in 30 seconds instead of 45 minutes. And the real fraud that does get flagged? It gets rubber-stamped too.
Fix: measure and report false positive rate as a first-class metric. Set an organizational target (under 50% for ML-based systems). If your FPR is above 80%, your system is actively making your team worse at catching fraud.
2. The Threshold Trap
One threshold for all customers. A $5,000 transaction is flagged whether the customer is a college student or a hedge fund manager. The student's $5,000 wire is suspicious. The fund manager's $5,000 wire is a rounding error. Same amount, completely different risk profile. Static thresholds generate massive false positive rates on high-value customers and miss fraud on low-value customers whose typical transactions are $50.
Fix: normalize transaction amounts relative to each customer's historical pattern. amount / avg_amount_90d is a better feature than amount alone. A transaction that is 10x a customer's average is suspicious regardless of whether the absolute amount is $500 or $50,000.
3. The Label Lag
Fraud labels arrive weeks or months after the transaction. A credit card chargeback takes 30-90 days. An internal investigation takes weeks. Your model is training on fraud patterns from 3 months ago, but fraudsters evolved their tactics 2 months ago. Your model is fighting the last war.
Fix: use two feedback loops. A fast loop (24-48 hours) based on analyst decisions: "I investigated this, it was fraud / not fraud." A slow loop (30-90 days) based on confirmed outcomes: chargebacks, account closures, law enforcement reports. Retrain weekly using the fast loop. Validate monthly against the slow loop.
4. The Feature Freeze
The feature table was built 3 years ago when the primary fraud vector was stolen cards. Since then, synthetic identity fraud has tripled, account takeover has doubled, and authorized push payment fraud has emerged as a new category. The features still focus on transaction amount and velocity. Nobody has added device fingerprints, network features, or behavioral biometrics.
Fix: review your feature set quarterly. Every new fraud type should trigger a feature review. If your fraud mix has shifted and your features have not, your model is optimizing for yesterday's threats.
5. The Solo Transaction Fallacy
Evaluating each transaction in isolation is like reading every sentence in a crime novel independently and trying to figure out who committed the murder. The sentences are grammatically correct. The plot only makes sense when you read them in sequence, in context, connected to each other.
The Solo Transaction Fallacy is why fraud rings operate undetected for months. Each transaction in the ring is individually normal. The ring is only visible when you see the connections between transactions, accounts, devices, and merchants.
Fix: move from transaction-level to entity-level analysis. Score accounts, devices, and networks, not just individual transactions. Graph-based approaches do this natively.
6. The Rules Graveyard
Ten thousand rules. Accumulated over 15 years. Written by analysts who left the company a decade ago. Nobody knows which rules are catching real fraud and which are just generating noise. Nobody dares remove any because the one they remove might be the one catching a specific fraud pattern. So the rules pile up, the false positive rate climbs, and the system becomes a black box of conflicting logic that is harder to understand than any neural network.
Fix: audit your rules quarterly. For each rule, measure: how many alerts did it generate? How many were confirmed fraud? What is its precision? Any rule with under 1% precision and no regulatory mandate should be a candidate for removal or replacement with an ML-based equivalent.