Payback Abuse Detection Solution
Solution Background and Business Value
Payback abuse is a form of fraud commonly found in buy-now-pay-later (BNPL) platforms. It is closely related to credit card fraud and certain types of insurance fraud. The core challenge in detecting payback abuse is making real-time transaction-level decisions to prevent fraudulent activity before financial losses occur.
An effective machine learning (ML) model helps businesses:
-
Reduce financial losses by blocking high-risk transactions before they go through.
-
Improve fraud detection rates by analyzing transaction patterns.
-
Minimize false positives to avoid blocking legitimate users.
A key business metric for evaluating fraud detection performance is $-amount weighted recall@K and precision@K, ensuring that the ML model maximizes profit while minimizing fraudulent transactions.
Data Requirements and Schema
To build an effective fraud detection model, we need a structured dataset that captures user transactions, payment history, and account details.
Core Tables
-
Transactions/Orders Table
-
Stores details about each transaction.
-
Key attributes:
-
order_id
: Unique transaction identifier. -
account_id
: Links the transaction to a user account. -
timestamp
: Time of transaction. -
Optional: Order value, merchant details, transaction type.
-
-
-
Payments Table
-
Tracks payments made for each transaction.
-
Key attributes:
-
payment_id
: Unique payment identifier. -
order_id
: Links the payment to a specific order. -
timestamp
: Time of payment. -
outstanding_amt
: Remaining balance for the order. -
Optional: Payment method, status.
-
-
-
Accounts Table
-
Stores user account details.
-
Key attributes:
-
account_id
: Unique account identifier. -
Optional: User demographics, credit history, risk score.
-
-
Additional Tables (Optional)
For improved fraud detection, consider including:
-
Merchants Table: Static data about merchants (e.g., reputation, fraud risk).
-
Items Table: Information about products involved in transactions.
-
Account 360 Table: Aggregated account data (e.g., transaction history, credit checks, previous fraud cases).
Entity Relationship Diagram (ERD)
Predictive Queries
The predictive query depends on how fraudulent transactions are defined. Two approaches are commonly used:
1. Unpaid Orders After X Days
If a fraudulent order is defined as one that remains unpaid after X days, we can train a model to predict this behavior:
Requirements:
-
The
payments
table must include an initial payment record for each order withoutstanding_amt = order_value
. -
This ensures that a negative label (not fraud) is generated for orders with no remaining balance.
2. Custom Fraud Labeling
If fraud is defined based on multiple signals (e.g., previous fraud history, chargeback patterns), we can store precomputed fraud labels in the transactions table:
Here, fraud_label
is a boolean column (1 = fraudulent, 0 = legitimate, None = pending prediction).
Deployment Strategy
1. Batch Fraud Detection for Inspection Teams
-
Suitable for scenarios without strict real-time requirements.
-
Predictions are generated in batches (e.g., every hour, daily).
-
Fraud analysts can review flagged transactions manually.
To filter transactions within a time window:
2. Real-Time Fraud Detection Using Embeddings
For real-time fraud detection, Kumo embeddings can be combined with real-time transaction features to produce instant fraud scores.
-
Generate user and transaction embeddings in batches.
-
Store embeddings in a feature store for quick retrieval.
-
Combine embeddings with real-time transaction features to calculate a fraud risk score at the time of purchase.
Building models in Kumo SDK
1. Initialize the Kumo SDK
2. Connect data
3. Select tables
4. Create graph schema
5. Train the model