What is positional encoding for graphs?

Positional encoding assigns each node a feature vector that encodes its position in the graph structure. Unlike sequences (where position is simply the index) or images (where position is the pixel coordinate), graphs have no natural ordering. Positional encodings provide this missing structural information, enabling GNNs and graph transformers to distinguish nodes that have identical features but different graph positions.

Why do GNNs need positional encodings?

Standard message passing GNNs are bounded by the 1-WL test: they cannot distinguish nodes with identical local neighborhoods. Positional encodings break this symmetry by giving each node a unique position-based feature. Two nodes that look identical to a standard GNN (same features, same neighbor features) become distinguishable when they have different positional encodings. This pushes expressiveness beyond the 1-WL bound.

What is Laplacian positional encoding?

Laplacian positional encoding (LapPE) uses the k smallest non-trivial eigenvectors of the graph Laplacian as node features. Each node gets a k-dimensional vector representing its position in the graph's spectral space. Nodes close in the graph have similar Laplacian PE values. This is analogous to sine/cosine positional encodings in transformers, but adapted for irregular graph structure.

What is random walk positional encoding?

Random walk positional encoding (RWPE) encodes each node by the probability that a random walk starting from that node returns to it after k steps (for k=1,2,...,K). This diagonal of the random walk transition matrix raised to different powers captures local topology: nodes in dense clusters have high return probabilities, while nodes on bridges or peripheries have low return probabilities.

How do positional encodings help enterprise graph transformers?

Graph transformers use global attention (every node attends to every other node), which ignores graph structure by default. Positional encodings inject structural awareness into the attention mechanism. With LapPE or RWPE, the transformer learns that nearby nodes (in graph distance) should attend to each other more strongly. KumoRFM uses positional encodings in its relational graph transformer to maintain structure awareness while enabling global information flow.

Positional Encoding for Graphs: Encoding Node Position in Graph Structure | Kumo.ai

Positional encoding for graphs assigns each node a feature vector that encodes its position within the graph structure, providing the structural context that message passing GNNs and graph transformers need to distinguish nodes with identical local neighborhoods. In sequences, position is the index (word 1, word 2, word 3). In images, position is the pixel coordinate. In graphs, there is no natural ordering. Positional encodings fill this gap by computing position-like features from the graph topology itself.

Why it matters for enterprise data

In enterprise relational databases, structural position carries signal. A customer at the center of a large social network behaves differently from a peripheral customer, even if their row-level features are identical. A product that is a hub in the co-purchase graph (connected to many other products) has different demand dynamics than a niche product.

Positional encodings make these structural roles visible to the model. Without them, two customers with identical features and identical 1-hop neighborhoods (but different 2-hop+ structures) would receive identical GNN representations. With positional encodings, the model can distinguish their structural contexts and make better predictions.

Laplacian positional encoding (LapPE)

Uses the k smallest non-trivial eigenvectors of the graph Laplacian as node features. Each node gets a k-dimensional vector representing its spectral position.

laplacian_pe.py

from torch_geometric.transforms import AddLaplacianEigenvectorPE

# Add 16-dimensional Laplacian PE to node features
transform = AddLaplacianEigenvectorPE(k=16, attr_name='laplacian_pe')
data = transform(data)

# data.laplacian_pe: [num_nodes, 16]
# Concatenate with original features
data.x = torch.cat([data.x, data.laplacian_pe], dim=-1)

# Now each node has original features + 16 spectral position features
# Nodes close in the graph have similar PE values
# Nodes far apart have different PE values

AddLaplacianEigenvectorPE computes the smallest eigenvectors and attaches them as node features. Concatenate with original features before training.

Sign ambiguity

Eigenvectors have a sign ambiguity: if v is an eigenvector, -v is also an eigenvector. Different runs may flip signs. Solutions: use absolute values, apply a sign-invariant network, or use random sign flipping as data augmentation during training.

Random walk positional encoding (RWPE)

Encodes each node by the probability that a random walk starting from that node returns to it after 1, 2, ..., K steps.

random_walk_pe.py

from torch_geometric.transforms import AddRandomWalkPE

# Add 16-step random walk PE
transform = AddRandomWalkPE(walk_length=16, attr_name='rwpe')
data = transform(data)

# data.rwpe[i] = [P(return to i in 1 step), P(return in 2 steps), ..., P(return in 16 steps)]
# Dense clusters: high return probabilities (walks stay local)
# Bridges/periphery: low return probabilities (walks escape quickly)

RWPE captures local topology through return probabilities. No eigendecomposition needed, making it more scalable than LapPE for large graphs.

Concrete example: distinguishing hub vs. peripheral customers

Two customers with identical features (same age, income, tenure) and identical 1-hop neighborhoods (same number of orders, similar amounts):

Customer A: connected to merchants that many other customers also use (hub in the bipartite graph). RWPE shows high return probability (walks cycle through dense co-purchase patterns).
Customer B: connected to niche merchants with few other customers (peripheral). RWPE shows low return probability (walks quickly reach dead ends).

Without PE, a standard GNN assigns identical representations. With PE, the model distinguishes them and can learn that hub customers churn less (strong network stickiness) while peripheral customers churn more.

Limitations and what comes next

LapPE scalability: Eigendecomposition is O(n^3). For million-node enterprise graphs, approximate methods (randomized SVD, iterative solvers) are needed. RWPE scales better (O(K * |E|)).
Sign ambiguity (LapPE): Eigenvector signs are arbitrary. This requires additional handling (sign-invariant networks, augmentation) that adds complexity.
Not a complete solution: PE helps with expressiveness but does not solve over-smoothing or over-squashing. These require architectural changes (skip connections, graph rewiring, transformers).

Key Takeaways

1Positional encodings assign each node a vector encoding its graph position. They provide the structural context that GNNs need to distinguish nodes with identical local neighborhoods.
2LapPE (Laplacian eigenvectors) encodes global spectral position. RWPE (random walk return probabilities) encodes local topology. Combining both gives global + local structural awareness.
3PE pushes GNN expressiveness beyond 1-WL by breaking symmetry: structurally different nodes get different PE values even when their neighborhoods are identical.
4Essential for graph transformers: global attention ignores structure by default. PE makes the transformer structure-aware. KumoRFM uses PE in its relational graph transformer.
5In PyG: AddLaplacianEigenvectorPE(k=16) and AddRandomWalkPE(walk_length=16). Concatenate with node features before training.

Positional Encoding: Encoding Node Position in Graph Structure