Solution Background and Business Value
Payback abuse is a form of fraud commonly found in buy-now-pay-later (BNPL) platforms. It is closely related to credit card fraud and certain types of insurance fraud. The core challenge in detecting payback abuse is making real-time transaction-level decisions to prevent fraudulent activity before financial losses occur. An effective machine learning (ML) model helps businesses:- Reduce financial losses by blocking high-risk transactions before they go through.
- Improve fraud detection rates by analyzing transaction patterns.
- Minimize false positives to avoid blocking legitimate users.
Data Requirements and Schema
To build an effective fraud detection model, we need a structured dataset that captures user transactions, payment history, and account details. Core Tables-
Transactions/Orders Table
- Stores details about each transaction.
-
Key attributes:
-
order_id
: Unique transaction identifier. -
account_id
: Links the transaction to a user account. -
timestamp
: Time of transaction. - Optional: Order value, merchant details, transaction type.
-
-
Payments Table
- Tracks payments made for each transaction.
-
Key attributes:
-
payment_id
: Unique payment identifier. -
order_id
: Links the payment to a specific order. -
timestamp
: Time of payment. -
outstanding_amt
: Remaining balance for the order. - Optional: Payment method, status.
-
-
Accounts Table
- Stores user account details.
-
Key attributes:
-
account_id
: Unique account identifier. - Optional: User demographics, credit history, risk score.
-
- Merchants Table: Static data about merchants (e.g., reputation, fraud risk).
- Items Table: Information about products involved in transactions.
- Account 360 Table: Aggregated account data (e.g., transaction history, credit checks, previous fraud cases).
Predictive Queries
The predictive query depends on how fraudulent transactions are defined. Two approaches are commonly used: 1. Unpaid Orders After X Days If a fraudulent order is defined as one that remains unpaid after X days, we can train a model to predict this behavior:-
The
payments
table must include an initial payment record for each order withoutstanding_amt = order_value
. - This ensures that a negative label (not fraud) is generated for orders with no remaining balance.
fraud_label
is a boolean column (1 = fraudulent, 0 = legitimate, None = pending prediction).
Deployment Strategy
1. Batch Fraud Detection for Inspection Teams- Suitable for scenarios without strict real-time requirements.
- Predictions are generated in batches (e.g., every hour, daily).
- Fraud analysts can review flagged transactions manually.
- Generate user and transaction embeddings in batches.
- Store embeddings in a feature store for quick retrieval.
- Combine embeddings with real-time transaction features to calculate a fraud risk score at the time of purchase.