Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn13 min read

How Do Companies Like Coinbase and Chime Detect Fraud with ML?

Fintechs do not just run XGBoost on transaction tables. They build multi-layered real-time risk systems that combine rules, supervised ML, behavioral analytics, graph-based detection, and human review. Here is how the stack actually works, what Coinbase and Chime do differently, and where graph ML catches the fraud that the other layers miss.

TL;DR

  • 1Modern fintechs use five layers of fraud defense: rules engine (velocity checks, blocklists), supervised ML (XGBoost on transaction features), behavioral analytics (session patterns, device fingerprinting), graph-based detection (network analysis for rings and coordinated fraud), and human-in-the-loop review.
  • 2Coinbase combines traditional fintech fraud scoring with crypto-specific signals: blockchain address risk scoring, sequence features from on-chain history, and multi-model architecture spanning both fiat and crypto transaction types.
  • 3Chime focuses on real-time behavioral anomaly detection for instant money movement products, relying on spending patterns and peer transaction networks more than traditional identity signals for their underbanked customer base.
  • 4Graph-based detection (layer 4) catches the significant share of fraud losses from organized attacks that layers 1-3 miss: fraud rings sharing devices, money mule chains, coordinated account takeovers. KumoRFM reads these patterns natively, tracing connections 6-7 hops deep.
  • 5On the SAP SALT benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost. On RelBench, KumoRFM zero-shot scores 76.71 AUROC vs 62.44 for LightGBM with manual features.

When someone asks "how does Coinbase detect fraud?" or "how does Chime prevent unauthorized transactions?" the real answer is never a single algorithm. It is an architecture. Modern fintechs build layered systems where each layer catches a different fraud type, and the layers work together to balance detection rates against false positive rates against customer experience.

This matters if you are building or upgrading a fraud detection system, because the biggest mistake teams make is over-investing in one layer (usually supervised ML) while leaving other layers completely unaddressed. The fraud you miss is usually the fraud your architecture cannot structurally detect, not the fraud your model scored incorrectly.

The five layers of fintech fraud defense

Every serious fintech fraud stack has these five layers. The specifics vary, but the architecture is consistent across Coinbase, Chime, Stripe, Square, Revolut, and most well-funded neobanks and payment processors.

Layer 1: Rules engine

Velocity checks (max 5 transactions per hour from a single device), blocklists (known fraudulent IPs, device fingerprints, email domains), hard constraints (transaction limits, geo restrictions), and sanctions screening. Rules are deterministic: they always fire the same way. They catch obvious fraud instantly and enforce regulatory requirements.

  • Best for: Known patterns, repeat offenders, sanctions screening, and regulatory requirements that demand deterministic outputs.
  • Watch out for: Cannot generalize. Every new fraud pattern requires a new rule, and sophisticated fraudsters learn the rules and stay just below the thresholds.

Layer 2: Supervised ML

Typically XGBoost or gradient boosted trees trained on transaction features: amount, time of day, merchant category, device type, velocity aggregates, geographic distance from home location. The model scores every transaction in real time (sub-100ms) with a fraud probability. Transactions above the threshold get blocked or sent to review. Supervised ML generalizes better than rules because it learns patterns from labeled data rather than matching explicit scenarios.

  • Best for: Real-time scoring of individual transactions. Catches card-not-present fraud, amount anomalies, and merchant-category patterns.
  • Watch out for: Flat table input. Cannot see connections between entities. Coordinated fraud across multiple accounts is invisible to per-transaction scoring.

Layer 3: Behavioral analytics

Session-level patterns that go beyond transaction features. How quickly does the user navigate the app? What is their typical login-to-transaction time? Do they usually transfer to this recipient? Device fingerprinting identifies devices even across browser resets and VPN changes. Typing cadence, swipe patterns, and navigation paths create a behavioral biometric that is hard to replicate.

  • Best for: Account takeover detection where credentials are correct but behavior is wrong. Bot detection and credential-stuffing attacks.
  • Watch out for: Per-session view. Cannot detect fraud rings where each individual session looks normal but the network of accounts is coordinated.

Layer 4: Graph-based detection

This layer analyzes the network of connections between accounts, devices, addresses, and transaction counterparties. It catches fraud that lives in relationships: 20 accounts sharing 3 devices, money mule chains forwarding funds through 5 intermediaries, coordinated account openings from the same IP block. Graph-based detection finds the organized fraud that layers 1-3 miss because their per-transaction or per-session view cannot see coordination across multiple entities. This is where KumoRFM fits.

  • Best for: Fraud rings, money mule chains, coordinated account takeovers, and any organized fraud that spans multiple entities and accounts.
  • Watch out for: Higher latency than layers 1-2 (typically under 500ms vs under 100ms). For instant-decision products, graph scores may need to run asynchronously and feed into risk thresholds.

Layer 5: Human-in-the-loop review

Transactions in the uncertain zone between clear-approve and clear-block get routed to human analysts. Good fraud teams use this layer not just for decisioning but as a feedback loop: analyst decisions become training labels for the ML models, and patterns that analysts catch repeatedly get encoded as rules or model features. The goal is to shrink this layer over time as the automated layers improve.

  • Best for: Edge cases, novel fraud patterns, and building training data to improve automated layers over time.
  • Watch out for: Scale-limited. If more than 5-10% of transactions reach this layer, your automated layers need improvement. Human review is expensive and slow.

five_layers_fintech_fraud_defense

layerwhat_it_doesfraud_types_caughtfraud_types_missedlatency
1. Rules engineVelocity checks, blocklists, hard constraints, sanctionsKnown patterns, repeat offenders, sanctions violationsNovel fraud, pattern variations, organized rings<10ms
2. Supervised MLReal-time transaction scoring on tabular featuresIndividually anomalous transactions, card-not-present fraudCoordinated fraud, shared-device rings, money mule chains<100ms
3. Behavioral analyticsSession patterns, device fingerprinting, biometric signalsAccount takeover, bot attacks, credential stuffingFraud from legitimate devices, coordinated ring activity<100ms
4. Graph-based detectionNetwork analysis of entity relationships and connectionsFraud rings, money mules, coordinated attacks, shared-device clustersIndividual anomalous transactions (better caught by layer 2)<500ms
5. Human reviewAnalyst decisioning + model feedback loopEdge cases, novel patterns, complex scenariosScale-limited - cannot review every transactionMinutes to hours

Each layer catches fraud types that the others miss. The architecture is layered precisely because no single approach catches everything.

What Coinbase does differently

Coinbase operates at the intersection of traditional financial fraud and crypto-native fraud. Their fraud stack handles both fiat transactions (bank deposits, card purchases) and cryptocurrency transactions (sends, swaps, DeFi interactions), each with different signal profiles.

Three things stand out about Coinbase's approach:

  1. Sequence features from on-chain history. For crypto transactions, Coinbase builds features from the blockchain itself: wallet age, transaction frequency patterns, interaction history with known high-risk protocols (mixers, bridges that have been exploited, sanctioned addresses). A withdrawal to a wallet that has only existed for 2 hours and received funds from a known mixer carries different risk than a withdrawal to a wallet with 3 years of normal DeFi activity. These sequence features are crypto-specific signals that traditional fraud models never see.
  2. Blockchain address risk scoring. Every destination address gets a risk score based on its on-chain history and connections. This is essentially a graph problem on the blockchain: how many hops away is this address from known bad actors? Has it received funds from sanctioned wallets? Does it have the transaction pattern of a personal wallet, an exchange, or a mixing service? Coinbase integrates with blockchain analytics providers like Chainalysis and also builds proprietary scoring.
  3. Multi-model architecture. Coinbase does not run one fraud model. They run multiple specialized models: one for fiat deposit fraud, one for crypto send risk, one for account takeover, one for new account fraud. Each model uses different features and different thresholds because the fraud patterns and acceptable false positive rates differ by transaction type. The models feed into a unified decisioning layer that combines scores with rules and routes to human review when needed.

What Chime does differently

Chime serves a different customer base than Coinbase. Their users are primarily underbanked consumers who may have thin credit files, limited banking history, and less conventional income patterns. This creates a unique fraud detection challenge: the signals that traditional banks use to establish trust (long credit history, stable income, existing banking relationships) are often unavailable.

Three things stand out about Chime's approach:

  1. Real-time ML scoring for instant products. Chime's SpotMe (overdraft coverage) and pay-anyone features require fraud decisions in milliseconds. You cannot hold a transaction for manual review when the product promise is instant money movement. Chime runs real-time ML models that score every transaction against the account's behavioral baseline: is this amount typical? Is this recipient in the user's usual transfer pattern? Is this device consistent with their history?
  2. Behavioral signals over identity signals. Because their customer base has thinner identity histories, Chime relies more on behavioral analytics: how users interact with the app over time, spending pattern consistency, peer transaction networks (who sends money to whom regularly). A user who has been depositing their paycheck biweekly for 8 months and sending rent to the same recipient monthly has built a strong behavioral baseline. A sudden $2,000 transfer to a new recipient at 3 AM triggers an anomaly score based on behavioral deviation, not just transaction features.
  3. Peer transaction network analysis. Chime's pay-anyone feature creates a natural transaction graph between users. Who sends money to whom, how often, and in what amounts. This peer network contains fraud signals: newly opened accounts that immediately receive transfers from multiple established accounts (potential money mule pattern), clusters of accounts that only transact with each other (potential fraud ring), accounts that receive funds and immediately transfer everything out (pass-through behavior).

Traditional bank vs fintech fraud stack

The difference is not just technology. It is architecture philosophy. Banks started with rules and added ML as an overlay. Fintechs started with ML and use rules as a floor.

traditional_bank_vs_fintech_fraud_stack

dimensiontraditional_bankmodern_fintech
Primary detectionRule-based (NICE Actimize, SAS AML)ML-first (XGBoost, custom models)
Model update cycleQuarterly to annuallyWeekly to daily
Decision speedMinutes to days (batch + manual review heavy)Milliseconds (real-time scoring, minimal manual review)
False positive rate90%+ on rule-based alerts20-40% with ML scoring
Graph-based detectionRare - some have Quantexa or similar for investigationGrowing - layer 4 adoption increasing as organized fraud grows
Behavioral analyticsLimited - session monitoring for online bankingDeep - device fingerprinting, typing patterns, navigation analysis
Customer frictionHigh - frequent blocks, slow resolutionLow - step-up verification instead of hard blocks
Feedback loop speedSlow - analyst labels take weeks to reach modelsFast - automated label pipelines, rapid retraining
Organized fraud detectionWeak - rules miss coordination, no native graph analysisImproving - graph-based layer catching rings and mule chains
Tech stack ownershipVendor-dependent (long integration cycles)In-house or API-first (rapid iteration)

Fintechs move faster at every layer. But both face the same structural gap: organized fraud requires graph-based detection that most stacks still lack.

Where graph ML fits: catching what the other layers miss

Layers 1-3 are good at catching fraud from individual bad actors: stolen cards, compromised accounts, bot-driven attacks. They struggle with organized fraud because each layer evaluates transactions or sessions individually. They cannot see coordination across multiple entities.

Here is what organized fraud looks like in practice, and why it requires a graph-based approach:

  1. Fraud rings on shared devices. A ring of 20 accounts controlled by the same group, all logging in from 3 physical devices. Each account individually passes behavioral checks because the fraudsters have learned to mimic normal behavior. Layer 2 (ML) scores each transaction as low risk because the amounts and patterns look normal. Layer 3 (behavioral) does not flag anything because the session patterns are realistic. But the graph reveals that 20 accounts sharing 3 devices is not normal. That cluster structure is the fraud signal.
  2. Money mule chains. Stolen funds move through a chain of accounts: A sends to B, B sends to C, C sends to D, D withdraws or converts to crypto. Each individual transfer is below thresholds and between apparently unrelated accounts. The chain is only visible in the transaction graph: rapid sequential transfers through newly opened accounts with no prior relationship to the sender.
  3. Coordinated account takeover. Credentials from a data breach are sold in batches. The buyer tests and takes over accounts in bulk. Individually, each account takeover might trigger behavioral anomaly detection (Layer 3). But the coordination, with 50 accounts taken over within the same 24-hour window using the same credential-testing patterns, is only visible in the graph. The graph shows the temporal cluster and the shared behavioral fingerprint across the batch.

Fintech fraud stack without graph layer

  • Rules catch known patterns and repeat offenders
  • Supervised ML scores individual transactions accurately
  • Behavioral analytics catches account takeover from anomalous sessions
  • Blind to fraud rings sharing devices across 20+ accounts
  • Cannot trace money mule chains through 4-7 intermediary accounts
  • Misses coordinated attacks where individual transactions look normal
  • A significant share of fraud losses come from organized ring attacks these layers cannot see

Fintech fraud stack with KumoRFM at layer 4

  • Layers 1-3 continue handling individual fraud types
  • KumoRFM reads the full account-device-transaction-address graph natively
  • Detects fraud rings by identifying shared-device clusters and behavioral similarity
  • Traces money mule chains 6-7 hops deep through intermediary accounts
  • Identifies coordinated attacks from temporal and network patterns
  • Reduces false positives by 40-60% through network context
  • No graph construction, no feature engineering - reads raw relational tables directly

PQL Query

PREDICT is_fraud
FOR EACH transactions.transaction_id
WHERE transactions.created_at > '2026-03-01'

One PQL query adds graph-based detection to your existing fraud stack. KumoRFM reads raw accounts, transactions, devices, and address tables from your data warehouse and discovers both single-transaction and network-based fraud patterns automatically.

Output

transaction_idfraud_prob_kumofraud_prob_layer2_mlwhat_kumo_sees
TXN-910010.930.88Both flag - stolen card, high-amount anomaly (tabular signal)
TXN-910020.890.15Fraud ring: account shares 2 devices with 18 other accounts opened in same week
TXN-910030.850.22Money mule chain: 5-hop transfer path from compromised account to crypto exchange
TXN-910040.040.38Layer 2 flags velocity spike but graph shows legitimate payroll batch pattern

The benchmark evidence

Fintech fraud data lives in multiple related tables: accounts, transactions, devices, sessions, addresses, merchants. The benchmarks that test multi-table prediction directly measure the capability that matters for fintech fraud detection.

sap_salt_benchmark_fintech

approachaccuracywhat_it_means_for_fintech_fraud
LLM + AutoML63%Generates features from table descriptions. No relational pattern discovery.
PhD Data Scientist + XGBoost75%Expert hand-crafts cross-table features. Captures some relational signal but limited depth.
KumoRFM (zero-shot)91%Reads relational tables directly. Discovers multi-hop fraud patterns automatically.

SAP SALT benchmark: KumoRFM outperforms expert-tuned XGBoost by 16 percentage points. For fintech fraud, this gap means catching organized attacks that flat-table models structurally miss.

relbench_benchmark_fintech

approachAUROCfeature_engineering_time
LightGBM + manual features62.4412.3 hours per task
KumoRFM zero-shot76.71~1 second
KumoRFM fine-tuned81.14Minutes

RelBench benchmark across 7 databases and 30 tasks: KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points.

How to add graph-based detection to your existing stack

You do not need to rebuild your fraud system to get graph-based detection. The practical path is additive:

  1. Keep layers 1-3 running. Your rules engine, supervised ML, and behavioral analytics are catching individual fraud effectively. Do not break what works.
  2. Connect KumoRFM to your data warehouse. Point it at your accounts, transactions, devices, and address tables. No ETL, no graph database, no feature engineering. KumoRFM reads the relational tables directly.
  3. Run graph-based scoring in parallel. Generate fraud probability scores that incorporate network context. These scores complement your existing layer 2 ML scores, not replace them.
  4. Combine scores in your decisioning layer. Use layer 2 scores for individual transaction risk and layer 4 scores for network-based risk. Weight them based on fraud type: individual card fraud leans on layer 2, organized ring activity leans on layer 4.
  5. Feed layer 5 analyst decisions back to all models. Human review labels improve both your supervised ML and graph-based detection over time. The feedback loop makes every layer better.

Frequently asked questions

How do companies like Coinbase and Chime detect fraud with ML?

Modern fintechs like Coinbase and Chime use multi-layered fraud detection stacks. The typical architecture has five layers: (1) a rules engine for velocity checks, blocklists, and hard constraints, (2) supervised ML models like XGBoost on transaction features for real-time scoring, (3) behavioral analytics using session patterns and device fingerprinting, (4) graph-based detection for network analysis of fraud rings, shared devices, and money mule patterns, and (5) human-in-the-loop review for edge cases and model feedback. Coinbase adds crypto-specific layers like blockchain address risk scoring and sequence features from on-chain transaction history. Chime focuses on real-time behavioral anomaly detection for instant money movement products.

What is the difference between how traditional banks and fintechs detect fraud?

Traditional banks rely heavily on rule-based systems (NICE Actimize, SAS) that flag transactions matching predefined scenarios. These systems produce high false positive rates (often 90%+) and react slowly to new fraud patterns. Fintechs build ML-first fraud stacks: real-time scoring models that evaluate every transaction in milliseconds, behavioral analytics that track session-level patterns, and automated decisioning that blocks or allows transactions without manual review for the majority of cases. Fintechs also iterate faster because they own their tech stack and can deploy model updates in days rather than months. The tradeoff: fintechs face higher fraud velocity because they offer instant money movement, and their younger customer base has thinner identity histories.

How does Coinbase detect fraud on cryptocurrency transactions?

Coinbase uses a multi-model architecture that combines traditional fintech fraud detection with crypto-specific signals. Their system includes sequence features derived from on-chain transaction history (wallet age, transaction patterns, known mixer usage), blockchain address risk scoring that evaluates the risk profile of destination addresses, supervised ML models for transaction-level scoring, and behavioral analytics for account-level anomaly detection. They also integrate blockchain analytics from providers like Chainalysis for sanctions screening and exposure analysis. The multi-model approach is necessary because crypto fraud spans both traditional patterns (account takeover, identity fraud) and crypto-native patterns (rug pulls, bridge exploits, mixer-based laundering).

How does Chime detect fraud in real time?

Chime processes millions of transactions daily and needs sub-100ms fraud scoring for their instant money movement products (SpotMe, pay-anyone transfers). Their stack uses real-time ML scoring on transaction features, behavioral anomaly detection that compares each action against the account holder's established patterns, and device intelligence for fingerprinting and session analysis. Because Chime serves an underbanked population with thinner credit histories, they rely more on behavioral signals (how users interact with the app, spending patterns over time, peer transaction networks) than on traditional identity verification signals.

What is graph-based fraud detection and why do fintechs need it?

Graph-based fraud detection models the relationships between accounts, devices, transactions, and addresses as a network (graph) and identifies suspicious patterns in the connections. Fintechs need it because organized fraud, which is the fastest-growing fraud type, produces signals that live in the connections between entities, not in any single transaction. A fraud ring sharing 3 devices across 20 accounts, money mules forwarding funds through chains of newly opened accounts, coordinated account takeovers using credentials from the same data breach: these patterns are invisible to per-transaction ML models but obvious in the graph. KumoRFM reads these network patterns natively, tracing connections 6-7 hops deep.

What are the five layers of fintech fraud defense?

The five layers are: (1) Rules engine, which handles velocity checks (max transactions per hour), blocklists (known bad IPs, devices, addresses), and hard constraints (transaction limits). (2) Supervised ML, typically XGBoost or gradient boosted trees trained on transaction features like amount, time, merchant category, and velocity aggregates. (3) Behavioral analytics, which tracks session-level patterns, device fingerprinting, typing cadence, navigation patterns, and compares against the user's baseline. (4) Graph-based detection, which analyzes the network of connections between accounts, devices, and entities to find fraud rings, money mule chains, and coordinated attacks. (5) Human-in-the-loop review for transactions that fall in the uncertain zone between clear-approve and clear-block, plus feedback loops to retrain models.

Where does KumoRFM fit in a fintech fraud detection stack?

KumoRFM fits at layer 4, the graph-based detection layer. It reads raw relational tables (accounts, transactions, devices, addresses, merchants) and automatically discovers predictive patterns across the full entity graph. This catches the organized fraud that layers 1-3 miss: fraud rings sharing devices, money mule chains moving funds through intermediary accounts, coordinated account takeovers from the same credential batch. KumoRFM does not replace the rules engine or behavioral analytics. It adds the network intelligence layer that catches the significant share of fraud losses that come from coordinated attacks. On the SAP SALT benchmark, KumoRFM achieves 91% accuracy vs 75% for PhD data scientists with XGBoost.

How do fintechs handle fraud false positives without losing customers?

False positives are the biggest operational challenge in fintech fraud detection. Block too aggressively and you lose customers. Block too little and you lose money. Fintechs manage this through layered decisioning: the rules engine handles clear-cut cases (known bad actors, sanctions hits), ML models score the middle tier with calibrated probability thresholds, and step-up verification (SMS, biometric, document upload) is triggered for the uncertain zone instead of outright blocking. Graph-based detection helps reduce false positives because network context disambiguates: a large transaction from an account connected to a healthy transaction graph is different from the same transaction from an account connected to known fraud nodes. This context reduces false positive rates by 40-60% compared to transaction-only scoring.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.