Structural Encoding: Encoding Local Graph Topology Around Each Node

Structural encodings are additional node features computed from the local graph topology around each node, providing GNNs with explicit information about graph structure that message passing alone cannot capture within its expressiveness limits. They include simple statistics like node degree and clustering coefficient, as well as more complex features like triangle counts, motif frequencies, and centrality measures. By providing this information as direct input features, structural encodings push GNN expressiveness beyond the 1-WL bound without changing the GNN architecture.

Why it matters for enterprise data

Enterprise relational graphs contain structural patterns that carry strong predictive signal:

Node degree: A customer with 1,000 transactions behaves differently from one with 5. Degree captures activity volume.
Triangle count: Fraud rings form triangles (A → B → C → A). The number of triangles a node participates in is a direct fraud indicator.
Clustering coefficient: Nodes in tight communities (high clustering) have different churn patterns than peripheral nodes (low clustering).
Betweenness centrality: Bridge nodes that connect different customer segments are structurally important and may have unique behavior.

Standard message passing cannot count triangles (a known 1-WL limitation). Adding triangle count as a structural encoding gives the fraud detection GNN direct access to this critical signal.

Common structural encodings

structural_encodings.py

import torch
from torch_geometric.utils import degree
import networkx as nx

# 1. Node degree (simplest structural feature)
deg = degree(data.edge_index[1], num_nodes=data.num_nodes)

# 2. Convert to NetworkX for complex structural features
G = nx.Graph()
edges = data.edge_index.T.tolist()
G.add_edges_from(edges)

# 3. Clustering coefficient (how connected are your neighbors?)
clustering = nx.clustering(G)
cc = torch.tensor([clustering.get(i, 0) for i in range(data.num_nodes)])

# 4. Triangle count
triangles = nx.triangles(G)
tri = torch.tensor([triangles.get(i, 0) for i in range(data.num_nodes)])

# 5. PageRank centrality
pagerank = nx.pagerank(G)
pr = torch.tensor([pagerank.get(i, 0) for i in range(data.num_nodes)])

# Stack all structural features
struct_features = torch.stack([deg, cc, tri.float(), pr], dim=1)

# Concatenate with original node features
data.x = torch.cat([data.x, struct_features], dim=-1)

Compute structural features once as preprocessing. Concatenate with node features. The GNN now has explicit access to local topology information.

Concrete example: fraud detection with triangle counts

A payment network:

500,000 account nodes, 10M transaction edges
Fraud rings form tight triangles: A pays B, B pays C, C pays A
Without triangle count: GCNConv cannot distinguish a node in a triangle from a node in a chain (both have degree 2)
With triangle count: nodes in fraud rings have triangle count > 0, chain nodes have triangle count = 0

Adding triangle count as a structural encoding gives the fraud GNN a direct signal for ring membership. This single feature can improve fraud detection AUROC by 3-5 points on graphs with ring-based fraud patterns.

Limitations and what comes next

Computation cost for complex features: Degree is O(|E|). Triangle counting is O(|E|^1.5). Betweenness centrality is O(n * |E|). For very large enterprise graphs, approximate algorithms are needed for expensive metrics.
Feature selection: Which structural features matter depends on the task. Triangle counts help for fraud but may not help for churn. Domain knowledge or feature importance analysis guides selection.
Static features: Structural encodings are computed once on a graph snapshot. In dynamic enterprise graphs, they must be periodically recomputed as the graph evolves.

Key Takeaways

1Structural encodings capture local graph topology (degree, triangles, clustering, centrality) as node features. They provide explicit structural information that message passing cannot derive within 1-WL limits.
2Complementary to positional encodings: structural = local topology (what your neighborhood looks like), positional = global position (where you are in the graph).
3For enterprise fraud detection, triangle count is a direct signal for ring-based fraud patterns. Adding it as a feature can improve AUROC by 3-5 points.
4Cheapest GNN improvement: degree + clustering coefficient as input features costs almost nothing to compute and often provides 2-5% accuracy improvement.
5Compute structural features once during preprocessing, concatenate with node features, and train the GNN as usual. No architectural changes needed.

Frequently asked questions

What are structural encodings in GNNs?

Structural encodings are additional node features that capture the local graph topology around each node. Examples include node degree, number of triangles the node participates in, local clustering coefficient, and counts of specific subgraph patterns (motifs). These features give the GNN explicit information about graph structure that message passing might not capture within its expressiveness limits.

How are structural encodings different from positional encodings?

Positional encodings (LapPE, RWPE) encode a node's global position in the graph. Structural encodings encode the local topology around the node. Two nodes in different parts of the graph might have identical structural encodings (same degree, same triangle count) but different positional encodings. They are complementary: positional = where you are, structural = what your neighborhood looks like.

Why do GNNs need structural encodings?

Standard message passing bounded by 1-WL cannot count certain substructures. For example, a node cannot determine how many triangles it participates in through message passing alone. But triangle count is highly predictive for many tasks (fraud detection, community structure). Structural encodings provide this information directly as input features, bypassing the expressiveness limitation.

What structural features are most useful for enterprise data?

For enterprise relational data: (1) Node degree captures connectivity (high-volume customers vs. low-volume). (2) Clustering coefficient indicates community embeddedness. (3) Triangle count reveals tight group membership (fraud rings form triangles). (4) PageRank centrality identifies influential nodes. (5) Betweenness centrality identifies bridge nodes connecting different communities.

How do you compute structural encodings in PyG?

PyG provides transforms: AddSelfLoops (implicit degree), and you can compute degree with degree(). For triangle counts and clustering coefficients, use NetworkX or igraph on the edge_index. Compute once as a preprocessing step and concatenate with node features before training. The computation cost is one-time and manageable for most enterprise graphs.