What is a shallow embedding?

A shallow embedding assigns a fixed, learnable vector to each node in the graph. The embedding is a lookup table: node 42 maps to vector [0.1, -0.3, 0.7, ...]. These vectors are learned by optimizing a random walk or proximity objective: nodes that co-occur in random walks should have similar embeddings. Node2Vec, DeepWalk, and LINE are shallow embedding methods.

What is a GNN embedding?

A GNN computes embeddings dynamically by running message passing on node features and graph structure. Node 42's embedding depends on its own features AND the features and structure of its neighbors. The embedding is parametric: it is computed by a neural network (the GNN), not looked up from a table. If node 42's neighbors change, its embedding changes automatically.

What does transductive vs inductive mean?

Transductive: the model can only embed nodes it saw during training. Shallow embeddings are transductive because each node has a fixed entry in the lookup table; new nodes have no entry. Inductive: the model can embed new, unseen nodes. GNNs are inductive because they compute embeddings from features and structure; any node with features and connections can be embedded, even if it was not in the training graph.

When should you use shallow embeddings over GNNs?

Shallow embeddings are appropriate when: (1) the graph is static and all nodes are known at training time, (2) node features are absent or uninformative, (3) the task is link prediction or node clustering where structural similarity is the primary signal, and (4) computational simplicity matters (no message passing overhead at inference). They are fast and effective for static graph analysis.

Shallow Embedding vs GNN: Lookup Tables (Node2Vec) vs Parametric Learning (GCN) | Kumo.ai

Shallow embeddings and GNNs are two fundamentally different ways to represent nodes as vectors. Shallow methods (Node2Vec, DeepWalk, LINE) treat the embedding as a lookup table: node 42 has a fixed vector that is learned during training. GNNs (GCN, GAT, GraphSAGE) treat the embedding as a computation: node 42's vector is computed from its features and its neighbors' features through message passing.

Shallow embeddings: the lookup table

A shallow embedding method maintains an embedding matrix Z of shape [num_nodes, embedding_dim]. Node i's embedding is simply row iof this matrix. Training optimizes these vectors so that structurally similar nodes (nodes that co-occur in random walks) have similar embeddings.

shallow_embedding.py

from torch_geometric.nn import Node2Vec

# Create Node2Vec model
model = Node2Vec(
    edge_index,
    embedding_dim=128,
    walk_length=20,
    context_size=10,
    walks_per_node=10,
    p=1.0,  # return parameter
    q=1.0,  # in-out parameter
    num_nodes=num_nodes,
)

# Train: optimize random walk co-occurrence
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(100):
    loss = model.loss(pos_rw, neg_rw)  # positive & negative walks
    loss.backward()
    optimizer.step()

# Inference: just a lookup
embedding = model(torch.tensor([42]))  # node 42's vector

Node2Vec learns a fixed vector per node by optimizing random walk co-occurrence. Inference is a simple lookup: O(1) per node.

Strengths

Simple: no neural network, no message passing. Just an embedding table and an optimization objective.
Fast inference: looking up a vector is O(1). No computation graph needed.
Structure-only: captures graph topology without needing node features.

Weaknesses

Transductive: new nodes have no embedding. You must retrain to embed them.
No features: ignores node attributes entirely. If features carry signal, it is lost.
Memory: stores one vector per node. For graphs with 100M nodes, the embedding table alone requires ~50 GB at 128 dimensions.
Static: if the graph changes (new edges, updated features), embeddings are stale until retrained.

GNN embeddings: the computation

A GNN computes embeddings through message passing. Node i's embedding depends on its own features and the features of its k-hop neighborhood:

Strengths

Inductive: any node with features and connections can be embedded, even if never seen during training. Critical for the cold-start problem.
Feature-aware: combines node attributes with graph structure. When a customer's age, location, and purchase history all matter, GNNs use them all.
Dynamic: embeddings update automatically when the graph changes. New neighbors? New features? The message passing computation produces updated embeddings.
Parameter-efficient: a GNN with 1M parameters can embed 100M nodes. Shallow embeddings need 100M x 128 = 12.8B parameters.

Weaknesses

Slower inference: must run message passing (neighbor aggregation) at inference time.
Requires features: if nodes have no meaningful features, GNNs have less to work with (though structural features like degree can substitute).

When to use which

Shallow embeddings: static graphs, no node features, structure-only tasks (community detection, link prediction on fixed networks), computational simplicity.
GNNs: dynamic graphs with new nodes arriving, rich node features, enterprise applications where features and structure both carry signal, production systems requiring generalization.
Hybrid: use shallow embeddings as initial node features for a GNN. This combines structural pre-training (Node2Vec captures topology) with feature-based learning (GNN incorporates attributes and multi-hop patterns).

Key Takeaways

1Shallow embeddings are lookup tables: one fixed vector per node, learned from random walk co-occurrence. Fast, simple, transductive (cannot embed new nodes).
2GNN embeddings are computed dynamically from features and neighborhood structure through message passing. Inductive (embed any node), feature-aware, parameter-efficient.
3The transductive/inductive distinction is decisive for enterprise applications: new customers, new products, and new transactions arrive constantly. GNNs handle them; shallow embeddings require retraining.
4Shallow embeddings ignore node features. GNNs combine features with structure. When both carry signal (most enterprise data), GNNs significantly outperform shallow methods.
5The NLP analogy: shallow embeddings are Word2Vec (fixed per word), GNNs are BERT (context-dependent). GNN embeddings for the same node differ based on its neighborhood, just as BERT embeddings differ based on sentence context.

Shallow Embedding vs GNN: Lookup Tables (Node2Vec) vs Parametric Learning (GCN)

Shallow embeddings: the lookup table

Strengths

Weaknesses

GNN embeddings: the computation

Strengths

Weaknesses

When to use which

Frequently asked questions

What is a shallow embedding?

What is a GNN embedding?

What does transductive vs inductive mean?

When should you use shallow embeddings over GNNs?

Related

From the Kumo Learn Hub

Learn more about graph ML