How do graph-based recommendations differ from collaborative filtering?

Collaborative filtering uses a user-item interaction matrix and factorizes it to find latent factors. Graph-based recommendations represent users, items, and their interactions as a graph, then use GNNs to propagate information across connections. The graph approach naturally incorporates side information (item categories, user demographics, social connections) and multi-hop paths (users who bought X also browsed Y, which was made by brand Z).

What is LightGCN and why is it popular for recommendations?

LightGCN is a simplified GNN designed specifically for collaborative filtering. It removes feature transformations and nonlinearities from standard GCN, keeping only neighborhood aggregation. The insight is that for recommendation, learning user and item embeddings through graph structure alone (without complex transformations) outperforms heavier models. It averages embeddings across layers rather than using only the final layer.

How do you handle the cold-start problem with graph recommendations?

Graph-based recommendations handle cold start better than matrix factorization because new users/items can be connected to the graph through side information: a new user's demographic profile connects them to similar users, a new product's category and brand connect it to existing items. GNNs propagate information through these connections to generate embeddings even without interaction history.

Can graph recommendations scale to millions of users and items?

Yes. PinSage (Pinterest) demonstrated graph-based recommendations at scale with 3 billion nodes and 18 billion edges. The key techniques are neighbor sampling (each node samples a fixed number of neighbors), mini-batching (train on subgraphs), and MapReduce-style distributed computation. PyG supports all of these through its NeighborLoader.

What accuracy improvements do graph recommendations provide?

On standard benchmarks (MovieLens, Amazon, Yelp), LightGCN improves Recall@20 by 10-16% over matrix factorization (BPR-MF). Adding side information through heterogeneous graphs provides further gains. Pinterest reported a 40% improvement in engagement after deploying PinSage. The improvement is largest when rich side information and multi-hop paths are available.

Graph-Based Recommendations: Why Graphs Outperform Collaborative Filtering | Kumo.ai

Graph-based recommendation systems outperform collaborative filtering by learning from the full network of user-item interactions, not just a sparse matrix. When you represent users, items, and their relationships as a graph, GNNs can propagate information across multi-hop paths: a user connects to purchased items, those items connect to their categories and other buyers, those buyers connect to their other purchases. This path structure is the recommendation signal.

The limitation of matrix factorization

Collaborative filtering represents interactions as a user-item matrix and decomposes it into low-rank factors. User u's preference for item i is the dot product of their latent vectors. This works well when the matrix is dense enough, but suffers from three structural limitations:

Sparsity: Most users interact with less than 0.1% of items. The matrix is 99.9% empty.
No side information: Item categories, brands, user demographics, and social connections must be engineered as additional features rather than being part of the model structure.
No path reasoning: The model cannot learn that “users who buy X also browse Y, and Y is made by the same brand as Z” because it only sees direct user-item interactions.

The graph formulation

A recommendation graph is bipartite at its core: user nodes connect to item nodes through interaction edges (purchased, viewed, rated). Adding metadata makes it heterogeneous:

User nodes: features include demographics, account age, activity level
Item nodes: features include price, description embeddings, popularity
Category/Brand nodes: item metadata becomes first-class entities
Social edges: user-follows-user captures social influence
Interaction edges: purchase, view, rating, add-to-cart with timestamps

LightGCN: the baseline that beat everything

LightGCN (He et al., 2020) showed that for recommendation, simpler is better. It removes feature transformations and nonlinear activations from GCN, keeping only the core operation: neighborhood aggregation.

lightgcn_layer.py

import torch
from torch_geometric.nn import LGConv

class LightGCN(torch.nn.Module):
    def __init__(self, num_users, num_items, embedding_dim, num_layers):
        super().__init__()
        self.num_users = num_users
        self.embedding = torch.nn.Embedding(
            num_users + num_items, embedding_dim
        )
        self.convs = torch.nn.ModuleList(
            [LGConv() for _ in range(num_layers)]
        )

    def forward(self, edge_index):
        x = self.embedding.weight
        xs = [x]
        for conv in self.convs:
            x = conv(x, edge_index)
            xs.append(x)
        # Average across all layers (not just final)
        return torch.stack(xs, dim=0).mean(dim=0)

LightGCN averages embeddings across all layers. Layer 0 is the raw embedding, layer 1 includes 1-hop neighbors, layer 2 includes 2-hop. The average captures multi-scale signals.

Why layer averaging works

Each layer captures a different scale of the graph. Layer 0 is the node's own embedding. Layer 1 incorporates direct neighbors (items a user interacted with). Layer 2 incorporates the neighbors of those items (other users who also interacted with them, plus their other items). Averaging across layers combines all these scales into a single representation.

Multi-hop paths are the recommendation signal

Consider recommending a product to user Alice. In the graph:

1-hop: Alice's directly purchased items (already known)
2-hop: Other users who bought the same items as Alice (similar users)
3-hop: Items those similar users bought that Alice has not seen yet (recommendations)

With 2-3 layers of message passing, Alice's embedding already encodes this 3-hop neighborhood. Scoring Alice against a candidate item is a single dot product. The entire multi-hop reasoning is baked into the embedding through graph structure.

Scaling to production: PinSage

Pinterest's PinSage (Ying et al., 2018) proved that graph recommendations work at internet scale: 3 billion nodes, 18 billion edges. The key innovations:

Random-walk neighbor sampling: instead of using all neighbors, sample a fixed set based on random walk importance. This bounds computation per node.
Mini-batch training: train on subgraphs, not the full graph. PyG's NeighborLoader implements this directly.
Curriculum training: start with easy negatives (random items), progress to hard negatives (popular items in the same category).

Pinterest reported a 40% improvement in user engagement after deploying PinSage, demonstrating that graph structure provides signal that no amount of feature engineering on flat data can replicate.

Cold-start advantage

Matrix factorization cannot score new users or items with no interactions because they have no learned latent vector. Graph models solve this through connectivity:

A new user's demographic features connect them to similar user nodes
A new item's category and brand connect it to existing items in the graph
Message passing propagates information from these connections to generate embeddings

This is the cold-start problem solved structurally rather than through heuristics.

Key Takeaways

1Graph-based recommendations represent users, items, and metadata as a network. GNNs learn embeddings through multi-hop message passing, capturing paths that matrix factorization cannot express.
2LightGCN proves simpler is better: pure neighborhood aggregation without feature transforms improves Recall@20 by 10-16% over matrix factorization on standard benchmarks.
3The recommendation signal is in multi-hop paths: user -> purchased items -> similar users -> their purchases. Two layers of GNN message passing encodes this 3-hop reasoning into a single embedding.
4Side information (categories, brands, social connections) integrates as additional nodes and edges. No manual feature engineering needed.
5PinSage proved scale: 3 billion nodes at Pinterest with 40% engagement improvement. Neighbor sampling and mini-batching make it tractable via PyG's NeighborLoader.

Graph-Based Recommendations: Why Graphs Outperform Collaborative Filtering