Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide8 min read

Graph-Based Recommendations: Why Graphs Outperform Collaborative Filtering

Collaborative filtering sees a sparse matrix. Graph neural networks see a rich network: users connect to items, items connect to categories, categories connect to brands. Multi-hop message passing captures the full context that matrix factorization compresses away.

PyTorch Geometric

TL;DR

  • 1Traditional collaborative filtering factorizes a user-item matrix into latent vectors. Graph-based recommendation represents users, items, and metadata as a bipartite (or heterogeneous) graph and uses GNNs to learn embeddings through structure.
  • 2LightGCN strips GCN down to pure neighborhood aggregation for recommendations: no feature transforms, no nonlinearities. It improves Recall@20 by 10-16% over matrix factorization on standard benchmarks.
  • 3The graph advantage is multi-hop context. A user who bought running shoes is 2 hops from all other running shoe buyers and 3 hops from the products those buyers also purchased. This path structure is the recommendation signal.
  • 4Side information (categories, brands, social connections, user demographics) integrates naturally as additional node types and edges. Matrix factorization requires manual feature engineering to incorporate these signals.
  • 5PinSage proved graph recommendations scale: 3 billion nodes, 18 billion edges, deployed at Pinterest with 40% engagement improvement. Neighbor sampling and mini-batching make it tractable.

Graph-based recommendation systems outperform collaborative filtering by learning from the full network of user-item interactions, not just a sparse matrix. When you represent users, items, and their relationships as a graph, GNNs can propagate information across multi-hop paths: a user connects to purchased items, those items connect to their categories and other buyers, those buyers connect to their other purchases. This path structure is the recommendation signal.

The limitation of matrix factorization

Collaborative filtering represents interactions as a user-item matrix and decomposes it into low-rank factors. User u's preference for item i is the dot product of their latent vectors. This works well when the matrix is dense enough, but suffers from three structural limitations:

  • Sparsity: Most users interact with less than 0.1% of items. The matrix is 99.9% empty.
  • No side information: Item categories, brands, user demographics, and social connections must be engineered as additional features rather than being part of the model structure.
  • No path reasoning: The model cannot learn that “users who buy X also browse Y, and Y is made by the same brand as Z” because it only sees direct user-item interactions.

The graph formulation

A recommendation graph is bipartite at its core: user nodes connect to item nodes through interaction edges (purchased, viewed, rated). Adding metadata makes it heterogeneous:

  • User nodes: features include demographics, account age, activity level
  • Item nodes: features include price, description embeddings, popularity
  • Category/Brand nodes: item metadata becomes first-class entities
  • Social edges: user-follows-user captures social influence
  • Interaction edges: purchase, view, rating, add-to-cart with timestamps

LightGCN: the baseline that beat everything

LightGCN (He et al., 2020) showed that for recommendation, simpler is better. It removes feature transformations and nonlinear activations from GCN, keeping only the core operation: neighborhood aggregation.

lightgcn_layer.py
import torch
from torch_geometric.nn import LGConv

class LightGCN(torch.nn.Module):
    def __init__(self, num_users, num_items, embedding_dim, num_layers):
        super().__init__()
        self.num_users = num_users
        self.embedding = torch.nn.Embedding(
            num_users + num_items, embedding_dim
        )
        self.convs = torch.nn.ModuleList(
            [LGConv() for _ in range(num_layers)]
        )

    def forward(self, edge_index):
        x = self.embedding.weight
        xs = [x]
        for conv in self.convs:
            x = conv(x, edge_index)
            xs.append(x)
        # Average across all layers (not just final)
        return torch.stack(xs, dim=0).mean(dim=0)

LightGCN averages embeddings across all layers. Layer 0 is the raw embedding, layer 1 includes 1-hop neighbors, layer 2 includes 2-hop. The average captures multi-scale signals.

Why layer averaging works

Each layer captures a different scale of the graph. Layer 0 is the node's own embedding. Layer 1 incorporates direct neighbors (items a user interacted with). Layer 2 incorporates the neighbors of those items (other users who also interacted with them, plus their other items). Averaging across layers combines all these scales into a single representation.

Multi-hop paths are the recommendation signal

Consider recommending a product to user Alice. In the graph:

  • 1-hop: Alice's directly purchased items (already known)
  • 2-hop: Other users who bought the same items as Alice (similar users)
  • 3-hop: Items those similar users bought that Alice has not seen yet (recommendations)

With 2-3 layers of message passing, Alice's embedding already encodes this 3-hop neighborhood. Scoring Alice against a candidate item is a single dot product. The entire multi-hop reasoning is baked into the embedding through graph structure.

Scaling to production: PinSage

Pinterest's PinSage (Ying et al., 2018) proved that graph recommendations work at internet scale: 3 billion nodes, 18 billion edges. The key innovations:

  • Random-walk neighbor sampling: instead of using all neighbors, sample a fixed set based on random walk importance. This bounds computation per node.
  • Mini-batch training: train on subgraphs, not the full graph. PyG's NeighborLoader implements this directly.
  • Curriculum training: start with easy negatives (random items), progress to hard negatives (popular items in the same category).

Pinterest reported a 40% improvement in user engagement after deploying PinSage, demonstrating that graph structure provides signal that no amount of feature engineering on flat data can replicate.

Cold-start advantage

Matrix factorization cannot score new users or items with no interactions because they have no learned latent vector. Graph models solve this through connectivity:

  • A new user's demographic features connect them to similar user nodes
  • A new item's category and brand connect it to existing items in the graph
  • Message passing propagates information from these connections to generate embeddings

This is the cold-start problem solved structurally rather than through heuristics.

Frequently asked questions

How do graph-based recommendations differ from collaborative filtering?

Collaborative filtering uses a user-item interaction matrix and factorizes it to find latent factors. Graph-based recommendations represent users, items, and their interactions as a graph, then use GNNs to propagate information across connections. The graph approach naturally incorporates side information (item categories, user demographics, social connections) and multi-hop paths (users who bought X also browsed Y, which was made by brand Z).

What is LightGCN and why is it popular for recommendations?

LightGCN is a simplified GNN designed specifically for collaborative filtering. It removes feature transformations and nonlinearities from standard GCN, keeping only neighborhood aggregation. The insight is that for recommendation, learning user and item embeddings through graph structure alone (without complex transformations) outperforms heavier models. It averages embeddings across layers rather than using only the final layer.

How do you handle the cold-start problem with graph recommendations?

Graph-based recommendations handle cold start better than matrix factorization because new users/items can be connected to the graph through side information: a new user's demographic profile connects them to similar users, a new product's category and brand connect it to existing items. GNNs propagate information through these connections to generate embeddings even without interaction history.

Can graph recommendations scale to millions of users and items?

Yes. PinSage (Pinterest) demonstrated graph-based recommendations at scale with 3 billion nodes and 18 billion edges. The key techniques are neighbor sampling (each node samples a fixed number of neighbors), mini-batching (train on subgraphs), and MapReduce-style distributed computation. PyG supports all of these through its NeighborLoader.

What accuracy improvements do graph recommendations provide?

On standard benchmarks (MovieLens, Amazon, Yelp), LightGCN improves Recall@20 by 10-16% over matrix factorization (BPR-MF). Adding side information through heterogeneous graphs provides further gains. Pinterest reported a 40% improvement in engagement after deploying PinSage. The improvement is largest when rich side information and multi-hop paths are available.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.