Solution Background and Business Value

Payback abuse is a form of fraud commonly found in buy-now-pay-later (BNPL) platforms. It is closely related to credit card fraud and certain types of insurance fraud. The core challenge in detecting payback abuse is making real-time transaction-level decisions to prevent fraudulent activity before financial losses occur.

An effective machine learning (ML) model helps businesses:

  • Reduce financial losses by blocking high-risk transactions before they go through.

  • Improve fraud detection rates by analyzing transaction patterns.

  • Minimize false positives to avoid blocking legitimate users.

A key business metric for evaluating fraud detection performance is $-amount weighted recall@K and precision@K, ensuring that the ML model maximizes profit while minimizing fraudulent transactions.

Data Requirements and Schema

To build an effective fraud detection model, we need a structured dataset that captures user transactions, payment history, and account details.

Core Tables

  1. Transactions/Orders Table

    • Stores details about each transaction.

    • Key attributes:

      • order_id: Unique transaction identifier.

      • account_id: Links the transaction to a user account.

      • timestamp: Time of transaction.

      • Optional: Order value, merchant details, transaction type.

  2. Payments Table

    • Tracks payments made for each transaction.

    • Key attributes:

      • payment_id: Unique payment identifier.

      • order_id: Links the payment to a specific order.

      • timestamp: Time of payment.

      • outstanding_amt: Remaining balance for the order.

      • Optional: Payment method, status.

  3. Accounts Table

    • Stores user account details.

    • Key attributes:

      • account_id: Unique account identifier.

      • Optional: User demographics, credit history, risk score.

Additional Tables (Optional)

For improved fraud detection, consider including:

  • Merchants Table: Static data about merchants (e.g., reputation, fraud risk).

  • Items Table: Information about products involved in transactions.

  • Account 360 Table: Aggregated account data (e.g., transaction history, credit checks, previous fraud cases).

Entity Relationship Diagram (ERD)

Predictive Queries

The predictive query depends on how fraudulent transactions are defined. Two approaches are commonly used:

1. Unpaid Orders After X Days

If a fraudulent order is defined as one that remains unpaid after X days, we can train a model to predict this behavior:

PREDICT LAST(payments.outstanding_amt, 0, X, days) != 0
FOR EACH orders.order_id

Requirements:

  • The payments table must include an initial payment record for each order with outstanding_amt = order_value.

  • This ensures that a negative label (not fraud) is generated for orders with no remaining balance.

2. Custom Fraud Labeling

If fraud is defined based on multiple signals (e.g., previous fraud history, chargeback patterns), we can store precomputed fraud labels in the transactions table:

PREDICT orders.fraud_label == 1
FOR EACH orders.order_id

Here, fraud_label is a boolean column (1 = fraudulent, 0 = legitimate, None = pending prediction).

Deployment Strategy

1. Batch Fraud Detection for Inspection Teams

  • Suitable for scenarios without strict real-time requirements.

  • Predictions are generated in batches (e.g., every hour, daily).

  • Fraud analysts can review flagged transactions manually.

To filter transactions within a time window:

ENTITY FILTER: orders.TIMESTAMP > MIN_TIMESTAMP

2. Real-Time Fraud Detection Using Embeddings

For real-time fraud detection, Kumo embeddings can be combined with real-time transaction features to produce instant fraud scores.

  1. Generate user and transaction embeddings in batches.

  2. Store embeddings in a feature store for quick retrieval.

  3. Combine embeddings with real-time transaction features to calculate a fraud risk score at the time of purchase.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

accounts = kumo.Table.from_source_table(
    source_table=connector.table('accounts'),
    primary_key='account_id',
).infer_metadata()

orders = kumo.Table.from_source_table(
    source_table=connector.table('orders'),
    time_column='timestamp',
).infer_metadata()

payments = kumo.Table.from_source_table(
    source_table=connector.table('payments'),
    time_column='timestamp',
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'accounts': accounts,
        'orders': orders,
        'payments': payments,
    },
    edges=[
        dict(src_table='orders', fkey='account_id', dst_table='accounts'),
        dict(src_table='payments', fkey='order_id', dst_table='orders'),
    ],
)

graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT LAST(payments.outstanding_amt, 0, X, days) != 0 FOR EACH orders.order_id"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")