A bipartite graph is a graph with two distinct sets of nodes where edges only connect nodes from different sets. Users and products form a bipartite graph: users buy products, but users do not buy other users, and products do not buy other products. Every edge crosses the boundary between the two sets.
This structure appears everywhere in enterprise systems: customers and products, job seekers and job listings, patients and medications, students and courses, advertisers and publishers. Whenever your data has two fundamentally different entity types connected by interactions, you have a bipartite graph.
Why bipartite structure matters
When you apply message passing to a bipartite graph, information flows in a specific pattern:
- Layer 1: user nodes receive messages from their purchased items. Item nodes receive messages from their buyers.
- Layer 2: user nodes now receive messages from items, which already absorbed information from other buyers. Users who share items get similar representations.
This is collaborative filtering implemented as message passing. Users who bought similar items naturally end up with similar embeddings, and items bought by similar users naturally end up with similar embeddings. The GNN learns this structure automatically.
Bipartite graphs in PyG
PyG supports bipartite message passing natively. Pass a tuple of feature matrices instead of a single tensor:
import torch
from torch_geometric.nn import SAGEConv
# 100 users with 16 features, 500 items with 32 features
x_users = torch.randn(100, 16)
x_items = torch.randn(500, 32)
# Bipartite edges: user -> item (purchases)
edge_index = torch.randint(0, 100, (2, 2000))
edge_index[1] = torch.randint(0, 500, (2000,))
# SAGEConv accepts bipartite input as (x_src, x_dst) tuple
conv = SAGEConv((16, 32), 64) # (src_dim, dst_dim) -> out_dim
# Messages flow from users to items
out_items = conv((x_users, x_items), edge_index)
# out_items.shape: [500, 64]
# Reverse edges for items -> users message passing
conv_rev = SAGEConv((32, 16), 64)
out_users = conv_rev((x_items, x_users), edge_index.flip(0))Bipartite message passing: source and destination have different feature dimensions. Flip edges for reverse direction.
Enterprise example: product recommendations
An e-commerce platform with 10 million users and 500,000 products. The bipartite graph connects users to products they purchased, clicked, or added to cart:
- User features: demographics, account age, average spend
- Item features: price, category embedding, rating, inventory
- Edge features: interaction type (purchase/click/cart), timestamp, quantity
After 2 layers of bipartite message passing, each user's embedding encodes not just their own features but the characteristics of their purchased items and, transitively, the behavior of users who bought similar items. This is strictly more powerful than matrix factorization because it incorporates node features and multi-hop collaborative signal.
from torch_geometric.data import HeteroData
from torch_geometric.nn import SAGEConv, to_hetero
import torch
class BipartiteGNN(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = SAGEConv(-1, 128)
self.conv2 = SAGEConv(128, 64)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index).relu()
return self.conv2(x, edge_index)
# Build bipartite HeteroData
data = HeteroData()
data['user'].x = torch.randn(10000, 16)
data['product'].x = torch.randn(5000, 32)
data['user', 'purchased', 'product'].edge_index = ...
# Convert to heterogeneous model
model = BipartiteGNN()
model = to_hetero(model, data.metadata(), aggr='sum')
# Output: user and product embeddings in shared 64-dim space
out = model(data.x_dict, data.edge_index_dict)
# Score: dot product between user and product embeddingsA bipartite recommendation model using HeteroData. User and product embeddings land in the same space for scoring.
Scaling bipartite GNNs
Bipartite graphs in enterprise settings are often extremely large (millions of users, millions of items, billions of edges). Key scaling strategies:
- Neighbor sampling: sample a fixed number of neighbors per node instead of using the full neighborhood. PyG's
NeighborLoaderhandles this. - Two-tower architecture: encode users and items independently, then score with a dot product. This allows precomputing all item embeddings offline and serving recommendations with approximate nearest neighbor search.
- Mini-batch training: sample subgraphs containing target edges and their local neighborhoods. Each batch trains on a small bipartite subgraph.