The business problem
Cross-sell and upsell generate 10-30% of revenue for mature e-commerce and B2B companies. Amazon attributes 35% of revenue to its recommendation engine, much of which is cross-sell (frequently bought together, customers also bought). The opportunity is enormous: acquiring a new customer costs 5-25x more than expanding an existing relationship.
Association rules (“people who bought X also bought Y”) capture obvious co-purchase patterns but miss the sequential, multi-hop journeys that drive expansion: a camera buyer becomes a lens buyer, then a tripod buyer, then a lighting buyer. Each step in the journey depends on the previous purchases and the behavior of similar customers.
Why flat ML fails
- No sequence awareness: Flat models predict next-purchase from current features. They miss that the purchase of a specific product often triggers a predictable sequence of follow-on purchases.
- No cross-customer learning: Customer A bought camera then lens. Customer B just bought a camera. The graph connects them through the shared product, transferring the pattern.
- No category crossing: Cross-category expansion (electronics to accessories to services) requires multi-hop graph traversal that flat features cannot capture.
- Timing matters: The cross-sell window varies by product pair. Camera-to-lens is 30 days. Software-to-training is 90 days. Temporal edge features capture this.
The relational schema
Node types:
Customer (id, segment, signup_date, total_spend)
Product (id, category, price, product_line)
Category (id, name, avg_expansion_rate)
Edge types:
Customer --[purchased]--> Product (amount, timestamp, channel)
Product --[upsell_to]--> Product (conversion_rate)
Product --[cross_sell]--> Product (lift_factor)
Product --[in_category]--> CategoryPurchase edges carry timestamps for sequence modeling. Product-to-product edges encode known cross-sell and upsell relationships.
PyG architecture: SAGEConv for next-product prediction
import torch
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv, HeteroConv, Linear
class CrossSellGNN(torch.nn.Module):
def __init__(self, hidden_dim=128):
super().__init__()
self.customer_lin = Linear(-1, hidden_dim)
self.product_lin = Linear(-1, hidden_dim)
self.conv1 = HeteroConv({
('customer', 'purchased', 'product'): SAGEConv(
hidden_dim, hidden_dim),
('product', 'upsell_to', 'product'): SAGEConv(
hidden_dim, hidden_dim),
('product', 'cross_sell', 'product'): SAGEConv(
hidden_dim, hidden_dim),
}, aggr='sum')
self.conv2 = HeteroConv({
('customer', 'purchased', 'product'): SAGEConv(
hidden_dim, hidden_dim),
('product', 'upsell_to', 'product'): SAGEConv(
hidden_dim, hidden_dim),
('product', 'cross_sell', 'product'): SAGEConv(
hidden_dim, hidden_dim),
}, aggr='sum')
def encode(self, x_dict, edge_index_dict):
x_dict['customer'] = self.customer_lin(
x_dict['customer'])
x_dict['product'] = self.product_lin(x_dict['product'])
x_dict = {k: F.relu(v) for k, v in
self.conv1(x_dict, edge_index_dict).items()}
x_dict = self.conv2(x_dict, edge_index_dict)
return x_dict
def predict_next(self, customer_emb, product_emb):
# Score all candidate products for each customer
return torch.sigmoid(
customer_emb @ product_emb.T)SAGEConv propagates purchase patterns through the product graph. The model scores all candidate products for each customer, ranking by purchase probability.
Expected performance
- Association rules: ~45 AUROC
- LightGBM (flat features): 62.44 AUROC
- GNN (SAGEConv): 75.83 AUROC
- KumoRFM (zero-shot): 76.71 AUROC
Or use KumoRFM in one line
PREDICT next_product FOR customer
USING customer, product, purchase, categoryOne PQL query. KumoRFM discovers cross-sell and upsell patterns from temporal purchase sequences automatically.