A co-purchase graph connects products that are frequently bought together by the same customers. If 30% of customers who buy a camera also buy a memory card within the same session, an edge connects those two products, weighted by the co-purchase frequency. The resulting graph captures product complementarity (items bought together) and, through cluster structure, substitutability (items that compete for the same need).
Construction
Building a co-purchase graph from transaction data:
- Define co-purchase window: Products bought by the same customer within a time window (same session, same day, or same week) are considered co-purchased.
- Count co-occurrences: For each product pair, count how many customers bought both within the window.
- Filter and normalize: Remove low-count edges (noise from random co-occurrence). Normalize by product popularity to avoid popular items dominating.
- Add node features: Product attributes (price, category, brand, description embedding, rating, review count).
What GNNs learn from co-purchase structure
1-hop: Direct complements
A camera node's immediate neighbors are memory cards, camera bags, tripods, and lens filters. After one layer of message passing, the camera's representation encodes “I am typically bought with photography accessories.”
2-hop: Indirect relationships
Camera → memory card → card reader. The camera does not directly co-occur with card readers, but through the memory card connection, the GNN learns this indirect relationship. Two layers of message passing let the camera representation encode this second-order pattern.
Cluster structure: Substitutes
Products that are substitutes (iPhone vs Samsung Galaxy) rarely appear in the same basket but share many of the same co-purchase neighbors (phone cases, screen protectors, chargers). In the GNN embedding space, substitutes cluster together even without direct edges, because their neighborhoods are similar.
Applications
- Recommendations: “Frequently bought together” powered by 1-hop neighbors. “Customers also bought” powered by 2-hop GNN representations.
- Bundle pricing: Identify product bundles (strongly connected subgraphs) and optimize bundle discounts based on co-purchase strength.
- Demand forecasting: Demand for complementary products is correlated. A spike in camera sales predicts memory card demand. The co-purchase graph encodes these correlations.
- Category discovery: Graph clustering on co-purchase structure reveals natural product categories that may differ from the retailer's taxonomy.
- Cold-start products: A new product with no purchase history but known category and attributes can borrow representations from its nearest neighbors in feature space.
Heterogeneous product graphs
Co-purchase edges are just one type. A rich product graph includes:
- Co-purchase: Bought together (strong intent signal)
- Co-view: Viewed in the same session (weaker but abundant)
- Co-review: Reviewed by the same customer
- Same-category: Share a taxonomy category
- Same-brand: From the same manufacturer
Using relation type encoding, the GNN learns different weights for each edge type, combining strong purchase signals with weaker but complementary behavioral signals.