Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide6 min read

Co-Purchase Graphs: Products Linked by Being Bought Together

When customers who buy a camera also buy a memory card and a case, those products form a triangle in the co-purchase graph. GNNs learn these product relationships to power recommendations, bundle pricing, and demand forecasting.

PyTorch Geometric

TL;DR

  • 1Co-purchase graphs connect products frequently bought together by the same customers. Edge weight reflects co-purchase frequency. The graph captures complementarity (camera + case) and substitutability (iPhone vs Samsung).
  • 2GNN representations embed complementary products nearby and substitutes in the same region. 2-hop paths reveal indirect relationships: camera buyers buy memory cards, memory card buyers buy card readers.
  • 3The ogbn-products dataset (2.4M products, 61M edges) from Amazon is the standard large-scale co-purchase benchmark. 47 product categories, node classification task.
  • 4Applications: 'frequently bought together' recommendations, bundle pricing optimization, demand forecasting (correlated demand through graph edges), and category/taxonomy discovery.
  • 5Combine co-purchase with co-view, co-review, and shared-category edges for a rich heterogeneous product graph. Each edge type carries different signal strength.

A co-purchase graph connects products that are frequently bought together by the same customers. If 30% of customers who buy a camera also buy a memory card within the same session, an edge connects those two products, weighted by the co-purchase frequency. The resulting graph captures product complementarity (items bought together) and, through cluster structure, substitutability (items that compete for the same need).

Construction

Building a co-purchase graph from transaction data:

  1. Define co-purchase window: Products bought by the same customer within a time window (same session, same day, or same week) are considered co-purchased.
  2. Count co-occurrences: For each product pair, count how many customers bought both within the window.
  3. Filter and normalize: Remove low-count edges (noise from random co-occurrence). Normalize by product popularity to avoid popular items dominating.
  4. Add node features: Product attributes (price, category, brand, description embedding, rating, review count).

What GNNs learn from co-purchase structure

1-hop: Direct complements

A camera node's immediate neighbors are memory cards, camera bags, tripods, and lens filters. After one layer of message passing, the camera's representation encodes “I am typically bought with photography accessories.”

2-hop: Indirect relationships

Camera → memory card → card reader. The camera does not directly co-occur with card readers, but through the memory card connection, the GNN learns this indirect relationship. Two layers of message passing let the camera representation encode this second-order pattern.

Cluster structure: Substitutes

Products that are substitutes (iPhone vs Samsung Galaxy) rarely appear in the same basket but share many of the same co-purchase neighbors (phone cases, screen protectors, chargers). In the GNN embedding space, substitutes cluster together even without direct edges, because their neighborhoods are similar.

Applications

  • Recommendations: “Frequently bought together” powered by 1-hop neighbors. “Customers also bought” powered by 2-hop GNN representations.
  • Bundle pricing: Identify product bundles (strongly connected subgraphs) and optimize bundle discounts based on co-purchase strength.
  • Demand forecasting: Demand for complementary products is correlated. A spike in camera sales predicts memory card demand. The co-purchase graph encodes these correlations.
  • Category discovery: Graph clustering on co-purchase structure reveals natural product categories that may differ from the retailer's taxonomy.
  • Cold-start products: A new product with no purchase history but known category and attributes can borrow representations from its nearest neighbors in feature space.

Heterogeneous product graphs

Co-purchase edges are just one type. A rich product graph includes:

  • Co-purchase: Bought together (strong intent signal)
  • Co-view: Viewed in the same session (weaker but abundant)
  • Co-review: Reviewed by the same customer
  • Same-category: Share a taxonomy category
  • Same-brand: From the same manufacturer

Using relation type encoding, the GNN learns different weights for each edge type, combining strong purchase signals with weaker but complementary behavioral signals.

Frequently asked questions

What is a co-purchase graph?

A co-purchase graph is a product-product graph where edges connect products that are frequently bought together by the same customers. If customers who buy product A often also buy product B, an edge (weighted by co-purchase frequency) connects A and B. The resulting graph captures product complementarity and substitutability.

How are co-purchase graphs used for recommendations?

GNNs on co-purchase graphs learn product representations where complementary products (phone + case) are embedded nearby. For a customer's recent purchases, the model recommends products close in the embedding space to what they bought. This captures 'frequently bought together' patterns and extends them through multi-hop paths.

What is the difference between co-purchase and co-view graphs?

Co-purchase edges are strong signals (user spent money). Co-view edges are weaker but more abundant (user looked but may not buy). In practice, both are used as separate edge types in a heterogeneous graph, with the GNN learning different weights for each signal type.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.