Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Shallow Embedding vs GNN: Lookup Tables (Node2Vec) vs Parametric Learning (GCN)

Shallow embeddings memorize a vector for each node. GNNs compute a vector from each node's features and neighborhood. The difference is fundamental: memorization vs generalization, transductive vs inductive, static vs dynamic.

PyTorch Geometric

TL;DR

  • 1Shallow embeddings (Node2Vec, DeepWalk, LINE): each node gets a fixed vector in a lookup table. Learned by optimizing random walk co-occurrence. Fast to train, fast at inference (just a lookup).
  • 2GNN embeddings (GCN, GAT, GraphSAGE): each node's vector is computed dynamically from its features and neighborhood through message passing. Parametric: a neural network generates embeddings.
  • 3The key tradeoff: shallow embeddings are transductive (cannot embed new nodes). GNNs are inductive (can embed any node with features and connections). For dynamic graphs where new nodes arrive continuously, GNNs are necessary.
  • 4Shallow embeddings ignore node features entirely (only graph structure). GNNs combine node features with graph structure. When features carry signal, GNNs significantly outperform shallow methods.
  • 5Hybrid approaches initialize GNN node embeddings with shallow embedding vectors, combining structural pre-training with feature-based learning.

Shallow embeddings and GNNs are two fundamentally different ways to represent nodes as vectors. Shallow methods (Node2Vec, DeepWalk, LINE) treat the embedding as a lookup table: node 42 has a fixed vector that is learned during training. GNNs (GCN, GAT, GraphSAGE) treat the embedding as a computation: node 42's vector is computed from its features and its neighbors' features through message passing.

Shallow embeddings: the lookup table

A shallow embedding method maintains an embedding matrix Z of shape [num_nodes, embedding_dim]. Node i's embedding is simply row iof this matrix. Training optimizes these vectors so that structurally similar nodes (nodes that co-occur in random walks) have similar embeddings.

shallow_embedding.py
from torch_geometric.nn import Node2Vec

# Create Node2Vec model
model = Node2Vec(
    edge_index,
    embedding_dim=128,
    walk_length=20,
    context_size=10,
    walks_per_node=10,
    p=1.0,  # return parameter
    q=1.0,  # in-out parameter
    num_nodes=num_nodes,
)

# Train: optimize random walk co-occurrence
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(100):
    loss = model.loss(pos_rw, neg_rw)  # positive & negative walks
    loss.backward()
    optimizer.step()

# Inference: just a lookup
embedding = model(torch.tensor([42]))  # node 42's vector

Node2Vec learns a fixed vector per node by optimizing random walk co-occurrence. Inference is a simple lookup: O(1) per node.

Strengths

  • Simple: no neural network, no message passing. Just an embedding table and an optimization objective.
  • Fast inference: looking up a vector is O(1). No computation graph needed.
  • Structure-only: captures graph topology without needing node features.

Weaknesses

  • Transductive: new nodes have no embedding. You must retrain to embed them.
  • No features: ignores node attributes entirely. If features carry signal, it is lost.
  • Memory: stores one vector per node. For graphs with 100M nodes, the embedding table alone requires ~50 GB at 128 dimensions.
  • Static: if the graph changes (new edges, updated features), embeddings are stale until retrained.

GNN embeddings: the computation

A GNN computes embeddings through message passing. Node i's embedding depends on its own features and the features of its k-hop neighborhood:

Strengths

  • Inductive: any node with features and connections can be embedded, even if never seen during training. Critical for the cold-start problem.
  • Feature-aware: combines node attributes with graph structure. When a customer's age, location, and purchase history all matter, GNNs use them all.
  • Dynamic: embeddings update automatically when the graph changes. New neighbors? New features? The message passing computation produces updated embeddings.
  • Parameter-efficient: a GNN with 1M parameters can embed 100M nodes. Shallow embeddings need 100M x 128 = 12.8B parameters.

Weaknesses

  • Slower inference: must run message passing (neighbor aggregation) at inference time.
  • Requires features: if nodes have no meaningful features, GNNs have less to work with (though structural features like degree can substitute).

When to use which

  • Shallow embeddings: static graphs, no node features, structure-only tasks (community detection, link prediction on fixed networks), computational simplicity.
  • GNNs: dynamic graphs with new nodes arriving, rich node features, enterprise applications where features and structure both carry signal, production systems requiring generalization.
  • Hybrid: use shallow embeddings as initial node features for a GNN. This combines structural pre-training (Node2Vec captures topology) with feature-based learning (GNN incorporates attributes and multi-hop patterns).

Frequently asked questions

What is a shallow embedding?

A shallow embedding assigns a fixed, learnable vector to each node in the graph. The embedding is a lookup table: node 42 maps to vector [0.1, -0.3, 0.7, ...]. These vectors are learned by optimizing a random walk or proximity objective: nodes that co-occur in random walks should have similar embeddings. Node2Vec, DeepWalk, and LINE are shallow embedding methods.

What is a GNN embedding?

A GNN computes embeddings dynamically by running message passing on node features and graph structure. Node 42's embedding depends on its own features AND the features and structure of its neighbors. The embedding is parametric: it is computed by a neural network (the GNN), not looked up from a table. If node 42's neighbors change, its embedding changes automatically.

What does transductive vs inductive mean?

Transductive: the model can only embed nodes it saw during training. Shallow embeddings are transductive because each node has a fixed entry in the lookup table; new nodes have no entry. Inductive: the model can embed new, unseen nodes. GNNs are inductive because they compute embeddings from features and structure; any node with features and connections can be embedded, even if it was not in the training graph.

When should you use shallow embeddings over GNNs?

Shallow embeddings are appropriate when: (1) the graph is static and all nodes are known at training time, (2) node features are absent or uninformative, (3) the task is link prediction or node clustering where structural similarity is the primary signal, and (4) computational simplicity matters (no message passing overhead at inference). They are fast and effective for static graph analysis.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.