Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Structural Encoding: Encoding Local Graph Topology Around Each Node

Structural encodings give GNNs explicit information about local graph topology that message passing alone cannot capture: degree, triangle counts, clustering coefficients, and motif frequencies. They are the simplest way to boost GNN expressiveness.

PyTorch Geometric

TL;DR

  • 1Structural encodings are node features computed from local graph topology: degree, triangle count, clustering coefficient, centrality measures. They provide explicit structural information.
  • 2Different from positional encodings: structural = what your neighborhood looks like (local topology). Positional = where you are in the graph (global position). They are complementary.
  • 3Motivation: 1-WL-bounded GNNs cannot count certain substructures (triangles, cycles). Structural encodings provide this as direct input features, bypassing the expressiveness limit.
  • 4For enterprise data: degree captures customer activity volume, triangle count reveals fraud rings, clustering coefficient indicates community membership, centrality identifies influential nodes.
  • 5Computation: one-time preprocessing step. Concatenate with node features before training. Degree is O(|E|), triangles are O(|E|^1.5), centrality varies by metric.

Structural encodings are additional node features computed from the local graph topology around each node, providing GNNs with explicit information about graph structure that message passing alone cannot capture within its expressiveness limits. They include simple statistics like node degree and clustering coefficient, as well as more complex features like triangle counts, motif frequencies, and centrality measures. By providing this information as direct input features, structural encodings push GNN expressiveness beyond the 1-WL bound without changing the GNN architecture.

Why it matters for enterprise data

Enterprise relational graphs contain structural patterns that carry strong predictive signal:

  • Node degree: A customer with 1,000 transactions behaves differently from one with 5. Degree captures activity volume.
  • Triangle count: Fraud rings form triangles (A → B → C → A). The number of triangles a node participates in is a direct fraud indicator.
  • Clustering coefficient: Nodes in tight communities (high clustering) have different churn patterns than peripheral nodes (low clustering).
  • Betweenness centrality: Bridge nodes that connect different customer segments are structurally important and may have unique behavior.

Standard message passing cannot count triangles (a known 1-WL limitation). Adding triangle count as a structural encoding gives the fraud detection GNN direct access to this critical signal.

Common structural encodings

structural_encodings.py
import torch
from torch_geometric.utils import degree
import networkx as nx

# 1. Node degree (simplest structural feature)
deg = degree(data.edge_index[1], num_nodes=data.num_nodes)

# 2. Convert to NetworkX for complex structural features
G = nx.Graph()
edges = data.edge_index.T.tolist()
G.add_edges_from(edges)

# 3. Clustering coefficient (how connected are your neighbors?)
clustering = nx.clustering(G)
cc = torch.tensor([clustering.get(i, 0) for i in range(data.num_nodes)])

# 4. Triangle count
triangles = nx.triangles(G)
tri = torch.tensor([triangles.get(i, 0) for i in range(data.num_nodes)])

# 5. PageRank centrality
pagerank = nx.pagerank(G)
pr = torch.tensor([pagerank.get(i, 0) for i in range(data.num_nodes)])

# Stack all structural features
struct_features = torch.stack([deg, cc, tri.float(), pr], dim=1)

# Concatenate with original node features
data.x = torch.cat([data.x, struct_features], dim=-1)

Compute structural features once as preprocessing. Concatenate with node features. The GNN now has explicit access to local topology information.

Concrete example: fraud detection with triangle counts

A payment network:

  • 500,000 account nodes, 10M transaction edges
  • Fraud rings form tight triangles: A pays B, B pays C, C pays A
  • Without triangle count: GCNConv cannot distinguish a node in a triangle from a node in a chain (both have degree 2)
  • With triangle count: nodes in fraud rings have triangle count > 0, chain nodes have triangle count = 0

Adding triangle count as a structural encoding gives the fraud GNN a direct signal for ring membership. This single feature can improve fraud detection AUROC by 3-5 points on graphs with ring-based fraud patterns.

Limitations and what comes next

  1. Computation cost for complex features: Degree is O(|E|). Triangle counting is O(|E|^1.5). Betweenness centrality is O(n * |E|). For very large enterprise graphs, approximate algorithms are needed for expensive metrics.
  2. Feature selection: Which structural features matter depends on the task. Triangle counts help for fraud but may not help for churn. Domain knowledge or feature importance analysis guides selection.
  3. Static features: Structural encodings are computed once on a graph snapshot. In dynamic enterprise graphs, they must be periodically recomputed as the graph evolves.

Frequently asked questions

What are structural encodings in GNNs?

Structural encodings are additional node features that capture the local graph topology around each node. Examples include node degree, number of triangles the node participates in, local clustering coefficient, and counts of specific subgraph patterns (motifs). These features give the GNN explicit information about graph structure that message passing might not capture within its expressiveness limits.

How are structural encodings different from positional encodings?

Positional encodings (LapPE, RWPE) encode a node's global position in the graph. Structural encodings encode the local topology around the node. Two nodes in different parts of the graph might have identical structural encodings (same degree, same triangle count) but different positional encodings. They are complementary: positional = where you are, structural = what your neighborhood looks like.

Why do GNNs need structural encodings?

Standard message passing bounded by 1-WL cannot count certain substructures. For example, a node cannot determine how many triangles it participates in through message passing alone. But triangle count is highly predictive for many tasks (fraud detection, community structure). Structural encodings provide this information directly as input features, bypassing the expressiveness limitation.

What structural features are most useful for enterprise data?

For enterprise relational data: (1) Node degree captures connectivity (high-volume customers vs. low-volume). (2) Clustering coefficient indicates community embeddedness. (3) Triangle count reveals tight group membership (fraud rings form triangles). (4) PageRank centrality identifies influential nodes. (5) Betweenness centrality identifies bridge nodes connecting different communities.

How do you compute structural encodings in PyG?

PyG provides transforms: AddSelfLoops (implicit degree), and you can compute degree with degree(). For triangle counts and clustering coefficients, use NetworkX or igraph on the edge_index. Compute once as a preprocessing step and concatenate with node features before training. The computation cost is one-time and manageable for most enterprise graphs.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.