In 2023, Amazon generated $574 billion in net revenue. The company has publicly stated that 35% of its revenue is driven by its recommendation engine. That is roughly $200 billion influenced by AI-powered personalization.
Most retailers know this. Most have invested in recommendation systems. Yet the gap between Amazon's personalization quality and the rest of the industry remains enormous. The reason is not compute budget or team size. It is data architecture. Amazon's models reason over the full product-customer-transaction-review-category graph. Most retailers still run collaborative filtering on a single user-item interaction matrix.
product_catalog — sample retail data
| product_id | name | category | brand | price | avg_rating |
|---|---|---|---|---|---|
| P-3001 | Standing Desk Pro | Furniture | ErgoMax | $549 | 4.7 |
| P-3002 | Blue-Light Monitor | Electronics | ViewClear | $389 | 4.5 |
| P-3003 | Ergonomic Keyboard | Electronics | TypeWell | $129 | 4.8 |
| P-3004 | Premium Office Chair | Furniture | ErgoMax | $799 | 4.6 |
| P-3005 | Desk Lamp LED | Lighting | BrightWork | $79 | 4.3 |
Five products spanning three categories. A collaborative filtering model sees them as unrelated. The product-customer-transaction graph reveals they cluster around a 'home office upgrade' behavior pattern.
recommendation_comparison — collaborative filtering vs graph
| Metric | Collaborative Filtering | Graph-Based (KumoRFM) | Improvement |
|---|---|---|---|
| Click-through rate | 2.1% | 3.8% | +81% |
| Revenue per impression | $0.42 | $1.14 | +171% |
| Cross-category discovery | 8% of recs | 34% of recs | +325% |
| Cold-start product coverage | 12% | 78% | +550% |
| Average basket size lift | — | +14% | New signal |
Graph-based recommendations drive 2-3x higher revenue per impression because they discover cross-category connections that expand the customer's consideration set.
The recommendation gap
Collaborative filtering was the state of the art in 2010. It works by finding users with similar purchase histories and recommending items that similar users bought. It is effective, well-understood, and cheap to run. It is also fundamentally limited.
A collaborative filtering model sees two dimensions: users and items. It does not know that a product belongs to the "organic" category, that the brand has a 4.8 satisfaction rating, that the user returned their last three purchases from a different brand, or that users who bought this item frequently also subscribe to a specific type of meal kit. All of that information exists in the retailer's database. It just sits in different tables.
Graph-based recommendation models operate on the full relational structure. Users, items, transactions, categories, brands, reviews, returns, sessions, wishlists, cart additions, and more. Each entity is a node. Each foreign-key relationship is an edge. The model learns which paths through this graph predict future purchases.
DoorDash deployed this approach for restaurant recommendations and saw a 1.8% engagement lift across 30 million users. In the delivery business, where margins are thin and order frequency is the primary growth lever, 1.8% across 30 million users translates to millions in annual revenue.
Cross-category discovery
The most valuable recommendations are the ones customers do not expect. Collaborative filtering excels at recommending more of what you already buy: you bought running shoes, here are more running shoes. Graph-based models discover cross-category connections.
A customer who bought a standing desk, a blue-light-blocking monitor, and an ergonomic keyboard might be an excellent candidate for a premium office chair. The products span three categories. No single-table model connects them. But in the product-category-brand graph, these items cluster around a "home office upgrade" behavior pattern that the model learns from thousands of similar customers.
McKinsey reports that cross-category recommendations generate 2-3x higher revenue per impression than within-category recommendations, because they expand the customer's consideration set rather than competing with items the customer was already going to buy.
Demand forecasting: from isolated products to networks
Traditional demand forecasting treats each product as an independent time series. Historical sales, seasonality, promotions, and sometimes external signals like weather or events. The models are mature (ARIMA, Prophet, gradient-boosted trees) and work reasonably well for stable products with long histories.
They fail in three scenarios that account for the majority of forecast errors: new products with no history, products with strong substitution or complementary effects, and disruptions that propagate through the supply network.
Substitution and complementary effects
When a popular brand of pasta goes out of stock, demand for competing brands spikes. Here is what that looks like in the data:
daily_sales — substitution event
| date | product | store | units_sold | in_stock |
|---|---|---|---|---|
| Mar 1 | Barilla Penne 16oz | Store #412 | 84 | Yes |
| Mar 2 | Barilla Penne 16oz | Store #412 | 91 | Yes |
| Mar 3 | Barilla Penne 16oz | Store #412 | 0 | No (stockout) |
| Mar 3 | DeCecco Penne 16oz | Store #412 | 67 | Yes |
| Mar 3 | Store Brand Penne 16oz | Store #412 | 42 | Yes |
| Mar 4 | DeCecco Penne 16oz | Store #412 | 58 | Yes |
| Mar 4 | Store Brand Penne 16oz | Store #412 | 38 | Yes |
Barilla sells 84-91 units daily, then stocks out on Mar 3. DeCecco jumps from its baseline of 25 to 67 units. Store brand jumps from 18 to 42. An independent forecast model predicted 25 and 18 for Mar 3, missing the substitution spike by 168% and 133%.
forecast_accuracy — independent vs network-aware
| product | independent_forecast | network_forecast | actual | error_reduction |
|---|---|---|---|---|
| DeCecco Penne | 25 units | 61 units | 67 units | 89% reduction |
| Store Brand Penne | 18 units | 39 units | 42 units | 88% reduction |
| Barilla Spaghetti (nearby) | 40 units | 52 units | 55 units | 83% reduction |
Network-aware models detect the substitution chain: Barilla stockout -> DeCecco and store brand surge -> nearby Barilla formats also see spillover demand.
Walmart has reported that incorporating cross-product network effects into demand forecasting reduced forecast error by 20-30% for products with strong substitution patterns. For a retailer managing 100,000 SKUs, a 20% reduction in forecast error translates to millions in reduced stockouts and overstock costs.
New product forecasting
Cold-start is the Achilles heel of traditional forecasting. A new product has no sales history. The standard approach is to map it to a similar existing product and use that product's history as a proxy. This works poorly because "similar" is hard to define and because it ignores the network context.
A graph-based model forecasts new products by leveraging their position in the product graph: category, brand, price tier, attributes, and the purchasing patterns of customers who browse or wishlist the product. The RelBench benchmark includes an Amazon product dataset with this exact cold-start challenge, and graph models outperform flat baselines by a significant margin.
Traditional retail AI
- Collaborative filtering on user-item matrix
- Each product forecasted independently
- Cold-start products use manual proxies
- Features manually engineered from single tables
- Cross-category signals missed entirely
Graph-based retail AI
- Recommendations from full relational graph
- Demand forecasting with cross-product network effects
- Cold-start solved through graph neighborhood
- Patterns learned automatically from table structure
- Cross-category discovery drives 2-3x higher revenue per impression
PQL Query
PREDICT next_purchase_product_id FOR EACH customers.customer_id WHERE customers.segment = 'Active'
One query generates personalized product recommendations for every active customer. The model traverses the full product-customer-transaction-category-brand graph to find cross-category signals.
Output
| customer_id | recommended_product | confidence | reasoning |
|---|---|---|---|
| C-8001 | P-3004 (Office Chair) | 0.84 | 3-hop: standing desk + keyboard buyers → chair |
| C-8002 | P-3005 (Desk Lamp) | 0.71 | 2-hop: ergonomic bundle pattern |
| C-8003 | P-3001 (Standing Desk) | 0.89 | 4-hop: similar customer upgrade path |
| C-8004 | P-3002 (Monitor) | 0.76 | 3-hop: electronics brand affinity transfer |
Customer lifetime value and retention
Customer lifetime value (CLV) prediction determines marketing spend, loyalty program design, and customer service prioritization. Most CLV models use RFM features (recency, frequency, monetary value) from the transaction table. These features are useful but miss the relational signals that differentiate high-value customers.
A customer with $500 in purchases looks identical to another customer with $500 in purchases in a flat model. In the graph, one customer bought high-margin products, referred three friends (who are also active buyers), and engages with loyalty rewards weekly. The other bought only discounted items, has never referred anyone, and only shops during clearance events. Their lifetime values are dramatically different, but the difference is only visible in the relational structure.
For churn prediction specifically, graph-based models detect early warning signals that flat models miss: the customer's purchase frequency is declining while similar customers maintain theirs. The customer is spending more in categories that are available at competitors. The customer's engagement pattern matches customers who churned 60-90 days later.
Intervening with the right offer at the right time retains 15-25% of at-risk customers. For a retailer with 10 million active customers and 15% annual churn, retaining an additional 20% of at-risk customers preserves 300,000 customer relationships per year.
The foundation model opportunity
Building separate ML models for recommendations, demand forecasting, CLV prediction, and churn prevention requires four separate data pipelines, four separate feature engineering efforts, and four separate model maintenance budgets. For a mid-size retailer, this means $2M-5M annually in ML team costs and 12-24 months before all four models reach production.
A relational foundation model serves all four use cases from a single platform. KumoRFM connects to the retailer's data warehouse, understands the product-customer-transaction-category schema, and answers any prediction question without task-specific engineering. "Which customers will churn in the next 30 days?" "What is the expected demand for this SKU next week?" "Which products should we recommend to this user?" "What is this customer's projected lifetime value?"
One model. One data connection. Four use cases. Time to first prediction: minutes, not months.
The retailers that adopt this approach will not just have better models. They will have the ability to ask questions about their data that were previously too expensive to answer. And in retail, where margins are 3-5% and every percentage point of conversion matters, the ability to predict faster and more accurately is the difference between growing and losing market share.