Return Prediction
“Which orders will be returned?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Which orders will be returned?
E-commerce return rates average 20-30%, costing US retailers $816B in returned merchandise annually (NRF). Each return costs $10-20 to process (shipping, inspection, restocking, depreciation). For a retailer shipping 5M orders per year with a 25% return rate, that is $12.5-25M in annual returns processing costs alone, plus the margin loss on items that cannot be resold at full price. Fashion and apparel returns are the worst offenders at 30-40%, often driven by sizing issues, color mismatches, and impulse buying that could be mitigated with better pre-purchase guidance.
Quick answer
Predicting which orders will be returned requires connecting order details, customer return history, product-level return rates, and sizing data into a single model. A customer who orders two sizes of the same blazer (bracket buying) with a 43% historical return rate on apparel is a very different risk profile than a repeat buyer ordering their known size. Relational models surface these multi-table signals at the moment of purchase, enabling proactive interventions (sizing tools, keep discounts) that reduce return rates by 15-25%. On SAP SALT benchmarks, relational approaches hit 91% accuracy vs 75% for XGBoost on customer behavior prediction tasks.
Approaches compared
4 ways to solve this problem
1. Product-level return rate thresholds
Flag orders containing products with historically high return rates. Simple to implement with a single lookup table.
Best for
Quick wins on the worst-offending products where return rates exceed 30% and the product team can improve sizing guides or descriptions.
Watch out for
Ignores customer-level signals entirely. A loyal customer with a 5% return rate ordering a high-return product is very different from a serial returner ordering the same product.
2. Logistic regression on order features
Classify orders as return-likely or not based on features like number of items, total value, customer tenure, and shipping method.
Best for
Teams that need an interpretable baseline model to identify the most predictive return risk factors.
Watch out for
Order-level features alone miss the richest signals: bracket buying patterns, customer-product brand familiarity, and product-specific sizing variance. Typical AUC of 0.65-0.72.
3. XGBoost with engineered features
Gradient-boosted trees with hand-built features: customer return rate, product return rate, size-mismatch indicators, bracket-buying flags, and review sentiment scores.
Best for
Analytics teams that can invest 4-6 weeks building return-specific feature pipelines across order, product, and customer data.
Watch out for
Feature engineering for return prediction is brittle. Every new product category (shoes vs. blazers vs. electronics) has different return drivers. SAP SALT shows 75% accuracy ceiling for flat tabular models.
4. KumoRFM (relational foundation model)
Connects orders, order items, customer return history, product return rates, sizing data, and browsing behavior into a relational graph. Predicts return probability at the moment of purchase.
Best for
Fashion and apparel retailers where return rates exceed 25% and the return drivers span multiple data sources (sizing, customer history, product reviews, browsing behavior).
Watch out for
Requires connected order-item-customer-product data. If you only have aggregate return rates without order-level detail, start with product-level thresholds.
Key metric: SAP SALT customer behavior: relational 91% vs XGBoost 75%. Proactive interventions reduce return rates by 15-25%, saving $10-20 per prevented return.
Why relational data changes the answer
A flat return model sees that Order ORD-8810 has 2 items worth $189.98 from a customer with a 43% return rate. It predicts a return. But it cannot see that the 2 items are the same blazer in sizes M and L (bracket buying), that this product has a known M-L boundary sizing issue with 22% size-related returns, and that the customer has never purchased this brand before. These signals live in order_items, product_return_rates, and customer_purchase_history tables respectively.
A relational model connects these tables and learns the compounding risk factors. Bracket buying + high product return rate + first-time brand purchase = 82% return probability. More importantly, the model identifies the specific intervention: offering a virtual fitting tool that resolves the M-L sizing question before shipping both units reduces the return probability from 82% to 35%. Single-table models can predict the return but cannot pinpoint the intervention because they do not see the underlying cause.
Predicting returns from order totals alone is like a doctor diagnosing illness from body temperature. A 101F fever could be a cold or appendicitis. A relational return model is like a doctor who checks your temperature (order value), asks about symptoms (bracket buying pattern), reviews your medical history (customer return rate), and examines the specific area of concern (product sizing variance). Same patient, dramatically better diagnosis and treatment plan.
How KumoRFM solves this
Relational intelligence built for retail and e-commerce data
Kumo connects orders, products, customer return history, sizing data, product reviews, and browsing behavior into a relational graph. The model predicts at the moment of purchase that Order ORD-8810 has a 68% return probability because the customer ordered two sizes (bracket buying), has returned 40% of past apparel purchases, and this product has a 4.1x higher return rate in the ordered size range. The retailer can proactively offer a virtual fitting tool, sizing recommendation, or slight discount to keep the right size and return the other before shipping both.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
ORDERS
| order_id | customer_id | total | items | shipping_method | timestamp |
|---|---|---|---|---|---|
| ORD-8810 | CU-3045 | $189.98 | 2 | Standard | 2025-09-14 |
| ORD-8811 | CU-3012 | $42.99 | 1 | Express | 2025-09-14 |
| ORD-8812 | CU-3078 | $67.50 | 3 | Standard | 2025-09-15 |
ORDER_ITEMS
| order_id | product_id | product_name | size | color | price |
|---|---|---|---|---|---|
| ORD-8810 | P-3001 | Slim Fit Blazer | M | Navy | $94.99 |
| ORD-8810 | P-3001 | Slim Fit Blazer | L | Navy | $94.99 |
| ORD-8811 | P-3020 | Running Shoes | 10 | Black | $42.99 |
CUSTOMER_RETURN_HISTORY
| customer_id | total_orders | total_returns | return_rate | common_reason |
|---|---|---|---|---|
| CU-3045 | 14 | 6 | 42.8% | Wrong Size |
| CU-3012 | 24 | 3 | 12.5% | Changed Mind |
| CU-3078 | 1 | 0 | 0% | N/A |
PRODUCT_RETURN_RATES
| product_id | overall_return_rate | size_issue_rate | size_range_variance |
|---|---|---|---|
| P-3001 | 32% | 22% | High (M-L boundary) |
| P-3020 | 8% | 3% | Low |
| P-3050 | 18% | 12% | Medium |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(ORDERS.RETURNED = 'True', 0, 30, days) FOR EACH ORDERS.ORDER_ID
Prediction output
Every entity gets a score, updated continuously
| ORDER_ID | CUSTOMER | RETURN_PROB | PRIMARY_RISK | INTERVENTION |
|---|---|---|---|---|
| ORD-8810 | CU-3045 | 0.82 | Bracket Buying | Sizing Tool + Keep Discount |
| ORD-8812 | CU-3078 | 0.24 | New Customer | Standard Follow-up |
| ORD-8811 | CU-3012 | 0.09 | Low Risk | None |
Understand why
Every prediction includes feature attributions — no black boxes
Order ORD-8810 (2x Slim Fit Blazer, M & L)
Predicted: 82% return probability
Top contributing features
Bracket buying pattern (2 sizes)
M and L
32% attribution
Customer historical return rate
42.8%
26% attribution
Product size-boundary variance
High (M-L)
20% attribution
Common return reason match
Wrong Size
13% attribution
No prior purchase of this brand
First time
9% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about return prediction
What is the average e-commerce return rate?
Overall e-commerce return rates average 20-30%, with fashion and apparel at 30-40% and electronics at 10-15% (NRF). Each return costs $10-20 to process (shipping, inspection, restocking). For a retailer shipping 5M orders per year with a 25% return rate, that is $12.5-25M in annual processing costs alone, plus margin loss on items that cannot be resold at full price.
Can AI actually reduce return rates or just predict them?
Both. Prediction enables intervention. When the model identifies a high-return-risk order at checkout, the retailer can offer a virtual fitting tool (resolves sizing uncertainty), a 'keep the right one' discount (incentivizes keeping rather than returning), or enhanced product information (better photos, size charts). These proactive interventions reduce return rates by 15-25%. The key is catching the risk before shipping, when the cost of intervention is a $3 discount vs a $15 return processing fee.
How does return prediction handle serial returners?
Relational models naturally identify serial returners because they connect the current order to the full return history graph. A customer with a 43% return rate who orders bracket sizes triggers a different intervention (mandatory sizing tool, reduced free return eligibility) than a loyal customer with a 5% return rate who is ordering a gift in an unfamiliar category. The model surfaces the customer-specific risk profile, not just the order-level prediction.
Bottom line: Reduce return rates by 15-25% through proactive sizing tools and targeted interventions, saving $12-25M annually in returns processing costs for a 5M-order retailer.
Related use cases
Explore more retail & e-commerce use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




