Shallow embeddings and GNNs are two fundamentally different ways to represent nodes as vectors. Shallow methods (Node2Vec, DeepWalk, LINE) treat the embedding as a lookup table: node 42 has a fixed vector that is learned during training. GNNs (GCN, GAT, GraphSAGE) treat the embedding as a computation: node 42's vector is computed from its features and its neighbors' features through message passing.
Shallow embeddings: the lookup table
A shallow embedding method maintains an embedding matrix Z of shape [num_nodes, embedding_dim]. Node i's embedding is simply row iof this matrix. Training optimizes these vectors so that structurally similar nodes (nodes that co-occur in random walks) have similar embeddings.
from torch_geometric.nn import Node2Vec
# Create Node2Vec model
model = Node2Vec(
edge_index,
embedding_dim=128,
walk_length=20,
context_size=10,
walks_per_node=10,
p=1.0, # return parameter
q=1.0, # in-out parameter
num_nodes=num_nodes,
)
# Train: optimize random walk co-occurrence
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(100):
loss = model.loss(pos_rw, neg_rw) # positive & negative walks
loss.backward()
optimizer.step()
# Inference: just a lookup
embedding = model(torch.tensor([42])) # node 42's vectorNode2Vec learns a fixed vector per node by optimizing random walk co-occurrence. Inference is a simple lookup: O(1) per node.
Strengths
- Simple: no neural network, no message passing. Just an embedding table and an optimization objective.
- Fast inference: looking up a vector is O(1). No computation graph needed.
- Structure-only: captures graph topology without needing node features.
Weaknesses
- Transductive: new nodes have no embedding. You must retrain to embed them.
- No features: ignores node attributes entirely. If features carry signal, it is lost.
- Memory: stores one vector per node. For graphs with 100M nodes, the embedding table alone requires ~50 GB at 128 dimensions.
- Static: if the graph changes (new edges, updated features), embeddings are stale until retrained.
GNN embeddings: the computation
A GNN computes embeddings through message passing. Node i's embedding depends on its own features and the features of its k-hop neighborhood:
Strengths
- Inductive: any node with features and connections can be embedded, even if never seen during training. Critical for the cold-start problem.
- Feature-aware: combines node attributes with graph structure. When a customer's age, location, and purchase history all matter, GNNs use them all.
- Dynamic: embeddings update automatically when the graph changes. New neighbors? New features? The message passing computation produces updated embeddings.
- Parameter-efficient: a GNN with 1M parameters can embed 100M nodes. Shallow embeddings need 100M x 128 = 12.8B parameters.
Weaknesses
- Slower inference: must run message passing (neighbor aggregation) at inference time.
- Requires features: if nodes have no meaningful features, GNNs have less to work with (though structural features like degree can substitute).
When to use which
- Shallow embeddings: static graphs, no node features, structure-only tasks (community detection, link prediction on fixed networks), computational simplicity.
- GNNs: dynamic graphs with new nodes arriving, rich node features, enterprise applications where features and structure both carry signal, production systems requiring generalization.
- Hybrid: use shallow embeddings as initial node features for a GNN. This combines structural pre-training (Node2Vec captures topology) with feature-based learning (GNN incorporates attributes and multi-hop patterns).