Solution Background and Business Value

Credit card fraud is a widespread issue affecting financial institutions, businesses, and consumers. Fraud occurs when a malicious actor gains unauthorized access to a customer’s credit card and makes fraudulent transactions. Common fraud methods include:

  • Physical theft of the card.

  • Skimming card details from payment terminals.

  • Online breaches where card data is stolen and misused.

To mitigate financial losses, businesses can use machine learning models to detect fraud in real-time, allowing them to:

  • Identify suspicious transactions early and intervene before money is lost.

  • Reduce false positives, ensuring legitimate transactions are not blocked.

  • Enhance fraud detection accuracy by leveraging graph-based patterns in transaction networks.

Data Requirements and Schema

Kumo AI processes relational data as interconnected tables using Graph Neural Networks (GNNs). This approach allows the model to learn from transaction patterns, account behavior, and merchant interactions without extensive feature engineering.

Core Tables

  1. Transactions Table

    • Stores all recorded transactions.

    • Key attributes:

      • transaction_id: Unique identifier for each transaction.

      • timestamp: When the transaction occurred.

      • credit_card_id: Links transaction to a credit card.

      • merchant_id: Links transaction to a merchant.

      • Optional: Location, currency, amount, transaction type.

  2. Credit Cards Table

    • Represents unique credit cards in the system.

    • Key attributes:

      • credit_card_id: Unique identifier.

      • cc_open_date: Date when the card was issued.

      • cc_close_date: Date when the card was closed (if applicable).

      • Optional: Credit limit, APR, fraud risk score.

  3. Fraud Reports Table

    • Stores fraud labels for transactions.

    • Key attributes:

      • transaction_id: Links to the transaction flagged as fraudulent.

      • timestamp: When the fraud report was filed.

      • label: 1 if fraudulent, 0 if legitimate.

Additional Tables (Optional)

  • Users Table: Links credit cards to customers.

  • User Stats Table: Stores aggregated stats like transaction count and total spend.

  • Merchants Table: Stores merchant details (e.g., category, location, risk rating).

Entity Relationship Diagram (ERD)

Predictive Query for Credit Card Fraud Detection

1. Transaction-Level Fraud Detection

Predict whether a transaction is fraudulent based on past fraud reports:

PREDICT transactions.LABEL
FOR EACH transactions.transaction_id
  • At inference time, leave LABEL empty for new transactions to generate fraud risk scores.

2. Time-Based Fraud Prediction

Predict whether a fraud report will be linked to a transaction in the next 30 days:

PREDICT SUM(fraud_reports.LABEL, 0, 30, days) > 0
FOR EACH transactions.transaction_id
ASSUMING COUNT(fraud_reports.*, 0, 30, days) >= 1

3. Credit Card Risk Prediction

Predict whether a credit card will be associated with fraudulent transactions in the next 7 days:

PREDICT COUNT(transactions.LABEL, 0, 7, days) >= 1
FOR EACH credit_cards.credit_card_id
ASSUMING COUNT(transactions.LABEL, 0, 7, days) >= 1

Deployment Strategy

1. Batch Predictions for Fraud Analysts

  • Fraud teams review high-risk transactions flagged by the ML model.

  • Predictions are generated hourly or daily in batch mode.

  • Fraud analysts label new fraudulent transactions, improving the model over time.

WHERE transactions.TIMESTAMP > MIN_TIMESTAMP

2. Real-Time Fraud Detection

  • The system generates instant fraud risk scores when a transaction occurs.

  • High-risk transactions trigger manual review or two-factor authentication.

  • ML embeddings enhance rule-based fraud detection.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

credit_cards = kumo.Table.from_source_table(
    source_table=connector.table('credit_cards'),
    primary_key='credit_card_id',
).infer_metadata()

transactions = kumo.Table.from_source_table(
    source_table=connector.table('transactions'),
    time_column='timestamp',
).infer_metadata()

fraud_reports = kumo.Table.from_source_table(
    source_table=connector.table('fraud_reports'),
    time_column='timestamp',
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'credit_cards': credit_cards,
        'transactions': transactions,
        'fraud_reports': fraud_reports,
    },
    edges=[
        dict(src_table='transactions', fkey='credit_card_id', dst_table='credit_cards'),
        dict(src_table='fraud_reports', fkey='transaction_id', dst_table='transactions'),
    ],
)

graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT transactions.LABEL FOR EACH transactions.transaction_id"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")