Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
7Classification · Loss Prevention

Return Prediction

Which orders will be returned?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Which orders will be returned?

E-commerce return rates average 20-30%, costing US retailers $816B in returned merchandise annually (NRF). Each return costs $10-20 to process (shipping, inspection, restocking, depreciation). For a retailer shipping 5M orders per year with a 25% return rate, that is $12.5-25M in annual returns processing costs alone, plus the margin loss on items that cannot be resold at full price. Fashion and apparel returns are the worst offenders at 30-40%, often driven by sizing issues, color mismatches, and impulse buying that could be mitigated with better pre-purchase guidance.

Quick answer

Predicting which orders will be returned requires connecting order details, customer return history, product-level return rates, and sizing data into a single model. A customer who orders two sizes of the same blazer (bracket buying) with a 43% historical return rate on apparel is a very different risk profile than a repeat buyer ordering their known size. Relational models surface these multi-table signals at the moment of purchase, enabling proactive interventions (sizing tools, keep discounts) that reduce return rates by 15-25%. On SAP SALT benchmarks, relational approaches hit 91% accuracy vs 75% for XGBoost on customer behavior prediction tasks.

Approaches compared

4 ways to solve this problem

1. Product-level return rate thresholds

Flag orders containing products with historically high return rates. Simple to implement with a single lookup table.

Best for

Quick wins on the worst-offending products where return rates exceed 30% and the product team can improve sizing guides or descriptions.

Watch out for

Ignores customer-level signals entirely. A loyal customer with a 5% return rate ordering a high-return product is very different from a serial returner ordering the same product.

2. Logistic regression on order features

Classify orders as return-likely or not based on features like number of items, total value, customer tenure, and shipping method.

Best for

Teams that need an interpretable baseline model to identify the most predictive return risk factors.

Watch out for

Order-level features alone miss the richest signals: bracket buying patterns, customer-product brand familiarity, and product-specific sizing variance. Typical AUC of 0.65-0.72.

3. XGBoost with engineered features

Gradient-boosted trees with hand-built features: customer return rate, product return rate, size-mismatch indicators, bracket-buying flags, and review sentiment scores.

Best for

Analytics teams that can invest 4-6 weeks building return-specific feature pipelines across order, product, and customer data.

Watch out for

Feature engineering for return prediction is brittle. Every new product category (shoes vs. blazers vs. electronics) has different return drivers. SAP SALT shows 75% accuracy ceiling for flat tabular models.

4. KumoRFM (relational foundation model)

Connects orders, order items, customer return history, product return rates, sizing data, and browsing behavior into a relational graph. Predicts return probability at the moment of purchase.

Best for

Fashion and apparel retailers where return rates exceed 25% and the return drivers span multiple data sources (sizing, customer history, product reviews, browsing behavior).

Watch out for

Requires connected order-item-customer-product data. If you only have aggregate return rates without order-level detail, start with product-level thresholds.

Key metric: SAP SALT customer behavior: relational 91% vs XGBoost 75%. Proactive interventions reduce return rates by 15-25%, saving $10-20 per prevented return.

Why relational data changes the answer

A flat return model sees that Order ORD-8810 has 2 items worth $189.98 from a customer with a 43% return rate. It predicts a return. But it cannot see that the 2 items are the same blazer in sizes M and L (bracket buying), that this product has a known M-L boundary sizing issue with 22% size-related returns, and that the customer has never purchased this brand before. These signals live in order_items, product_return_rates, and customer_purchase_history tables respectively.

A relational model connects these tables and learns the compounding risk factors. Bracket buying + high product return rate + first-time brand purchase = 82% return probability. More importantly, the model identifies the specific intervention: offering a virtual fitting tool that resolves the M-L sizing question before shipping both units reduces the return probability from 82% to 35%. Single-table models can predict the return but cannot pinpoint the intervention because they do not see the underlying cause.

Predicting returns from order totals alone is like a doctor diagnosing illness from body temperature. A 101F fever could be a cold or appendicitis. A relational return model is like a doctor who checks your temperature (order value), asks about symptoms (bracket buying pattern), reviews your medical history (customer return rate), and examines the specific area of concern (product sizing variance). Same patient, dramatically better diagnosis and treatment plan.

How KumoRFM solves this

Relational intelligence built for retail and e-commerce data

Kumo connects orders, products, customer return history, sizing data, product reviews, and browsing behavior into a relational graph. The model predicts at the moment of purchase that Order ORD-8810 has a 68% return probability because the customer ordered two sizes (bracket buying), has returned 40% of past apparel purchases, and this product has a 4.1x higher return rate in the ordered size range. The retailer can proactively offer a virtual fitting tool, sizing recommendation, or slight discount to keep the right size and return the other before shipping both.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

ORDERS

order_idcustomer_idtotalitemsshipping_methodtimestamp
ORD-8810CU-3045$189.982Standard2025-09-14
ORD-8811CU-3012$42.991Express2025-09-14
ORD-8812CU-3078$67.503Standard2025-09-15

ORDER_ITEMS

order_idproduct_idproduct_namesizecolorprice
ORD-8810P-3001Slim Fit BlazerMNavy$94.99
ORD-8810P-3001Slim Fit BlazerLNavy$94.99
ORD-8811P-3020Running Shoes10Black$42.99

CUSTOMER_RETURN_HISTORY

customer_idtotal_orderstotal_returnsreturn_ratecommon_reason
CU-304514642.8%Wrong Size
CU-301224312.5%Changed Mind
CU-3078100%N/A

PRODUCT_RETURN_RATES

product_idoverall_return_ratesize_issue_ratesize_range_variance
P-300132%22%High (M-L boundary)
P-30208%3%Low
P-305018%12%Medium
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT BOOL(ORDERS.RETURNED = 'True', 0, 30, days)
FOR EACH ORDERS.ORDER_ID
3

Prediction output

Every entity gets a score, updated continuously

ORDER_IDCUSTOMERRETURN_PROBPRIMARY_RISKINTERVENTION
ORD-8810CU-30450.82Bracket BuyingSizing Tool + Keep Discount
ORD-8812CU-30780.24New CustomerStandard Follow-up
ORD-8811CU-30120.09Low RiskNone
4

Understand why

Every prediction includes feature attributions — no black boxes

Order ORD-8810 (2x Slim Fit Blazer, M & L)

Predicted: 82% return probability

Top contributing features

Bracket buying pattern (2 sizes)

M and L

32% attribution

Customer historical return rate

42.8%

26% attribution

Product size-boundary variance

High (M-L)

20% attribution

Common return reason match

Wrong Size

13% attribution

No prior purchase of this brand

First time

9% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about return prediction

What is the average e-commerce return rate?

Overall e-commerce return rates average 20-30%, with fashion and apparel at 30-40% and electronics at 10-15% (NRF). Each return costs $10-20 to process (shipping, inspection, restocking). For a retailer shipping 5M orders per year with a 25% return rate, that is $12.5-25M in annual processing costs alone, plus margin loss on items that cannot be resold at full price.

Can AI actually reduce return rates or just predict them?

Both. Prediction enables intervention. When the model identifies a high-return-risk order at checkout, the retailer can offer a virtual fitting tool (resolves sizing uncertainty), a 'keep the right one' discount (incentivizes keeping rather than returning), or enhanced product information (better photos, size charts). These proactive interventions reduce return rates by 15-25%. The key is catching the risk before shipping, when the cost of intervention is a $3 discount vs a $15 return processing fee.

How does return prediction handle serial returners?

Relational models naturally identify serial returners because they connect the current order to the full return history graph. A customer with a 43% return rate who orders bracket sizes triggers a different intervention (mandatory sizing tool, reduced free return eligibility) than a loyal customer with a 5% return rate who is ordering a gift in an unfamiliar category. The model surfaces the customer-specific risk profile, not just the order-level prediction.

Bottom line: Reduce return rates by 15-25% through proactive sizing tools and targeted interventions, saving $12-25M annually in returns processing costs for a 5M-order retailer.

Topics covered

return prediction AIe-commerce returns analyticsorder return predictionreverse logistics AIgraph neural network returnsKumoRFMrelational deep learning retailreturn rate reductionproduct return forecastingretail loss prevention AI

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.