Solution Background and Business Value

Financial institutions and law enforcement agencies need to detect and prevent money laundering before illicit funds leave an account. If undetected, financial institutions can become liable and criminal activities may continue unchecked. This problem is even more challenging in cryptocurrency transactions, where users can create multiple accounts easily, and single transactions may involve multiple parties.

Graph Neural Networks (GNNs) are effective for identifying suspicious patterns in transaction networks that are difficult to detect using traditional fraud detection methods.

This document outlines how to:

  • Structure your data for money laundering detection.

  • Train a classifier using Kumo AI.

  • Deploy the model in real-world fraud detection systems.

Data Requirements and Schema

We start with a core set of tables and can extend the model by adding more fraud signals over time.

Core Tables

  1. Accounts Table

    • Stores account details.

    • Key attributes:

      • account_id: Unique identifier for each account.

      • Optional: Location, phone number, creation timestamp, account type, risk score.

  2. Transactions Table

    • Records all transactions (deposits, withdrawals, transfers).

    • Key attributes:

      • transaction_id: Unique identifier.

      • timestamp: Transaction time.

      • amount: Transaction value.

  3. Inputs Table

    • Tracks the source accounts for each transaction.

    • Key attributes:

      • transaction_id: Links to a transaction.

      • account_id: Links to the sender’s account.

      • timestamp: Time of transaction.

  4. Outputs Table

    • Tracks the destination accounts for each transaction.

    • Key attributes:

      • transaction_id: Links to a transaction.

      • account_id: Links to the receiver’s account.

      • timestamp: Time of transaction.

  5. Reports Table

    • Tracks accounts reported for money laundering.

    • Key attributes:

      • account_id: Links to an account.

      • timestamp: Time of report.

      • Optional: Reason, severity, reporting entity.

Entity Relationship Diagram (ERD)

Predictive Queries

To stop money laundering, we must predict fraud risk as soon as funds enter an account.

Money Laundering Prediction

This model predicts the probability that an account will be reported for money laundering in the next N days:

PREDICT COUNT(reports.*, 0, N, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

Different Time Horizons

To detect different fraud patterns, we can train models for various time windows:

// Likely to be reported within 10 days
PREDICT COUNT(reports.*, 0, 10, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

// Likely to be reported in 10-30 days
PREDICT COUNT(reports.*, 10, 30, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

// Likely to be reported in 30-90 days
PREDICT COUNT(reports.*, 30, 90, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

accounts = kumo.Table.from_source_table(
    source_table=connector.table('accounts'),
    primary_key='account_id',
).infer_metadata()

transactions = kumo.Table.from_source_table(
    source_table=connector.table('transactions'),
    time_column='timestamp',
).infer_metadata()

inputs = kumo.Table.from_source_table(
    source_table=connector.table('inputs'),
    time_column='timestamp',
).infer_metadata()

outputs = kumo.Table.from_source_table(
    source_table=connector.table('outputs'),
    time_column='timestamp',
).infer_metadata()

reports = kumo.Table.from_source_table(
    source_table=connector.table('reports'),
    time_column='timestamp',
).infer_metadata()

4. Define graph schema

graph = kumo.Graph(
    tables={
        'accounts': accounts,
        'transactions': transactions,
        'inputs': inputs,
        'outputs': outputs,
        'reports': reports,
    },
    edges=[
        dict(src_table='inputs', fkey='transaction_id', dst_table='transactions'),
        dict(src_table='inputs', fkey='account_id', dst_table='accounts'),
        dict(src_table='outputs', fkey='transaction_id', dst_table='transactions'),
        dict(src_table='outputs', fkey='account_id', dst_table='accounts'),
        dict(src_table='reports', fkey='account_id', dst_table='accounts'),
    ],
)

graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT COUNT(reports.*, 0, N, days ) > 0 FOR EACH accounts.account_id"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")