Structural encodings are additional node features computed from the local graph topology around each node, providing GNNs with explicit information about graph structure that message passing alone cannot capture within its expressiveness limits. They include simple statistics like node degree and clustering coefficient, as well as more complex features like triangle counts, motif frequencies, and centrality measures. By providing this information as direct input features, structural encodings push GNN expressiveness beyond the 1-WL bound without changing the GNN architecture.
Why it matters for enterprise data
Enterprise relational graphs contain structural patterns that carry strong predictive signal:
- Node degree: A customer with 1,000 transactions behaves differently from one with 5. Degree captures activity volume.
- Triangle count: Fraud rings form triangles (A → B → C → A). The number of triangles a node participates in is a direct fraud indicator.
- Clustering coefficient: Nodes in tight communities (high clustering) have different churn patterns than peripheral nodes (low clustering).
- Betweenness centrality: Bridge nodes that connect different customer segments are structurally important and may have unique behavior.
Standard message passing cannot count triangles (a known 1-WL limitation). Adding triangle count as a structural encoding gives the fraud detection GNN direct access to this critical signal.
Common structural encodings
import torch
from torch_geometric.utils import degree
import networkx as nx
# 1. Node degree (simplest structural feature)
deg = degree(data.edge_index[1], num_nodes=data.num_nodes)
# 2. Convert to NetworkX for complex structural features
G = nx.Graph()
edges = data.edge_index.T.tolist()
G.add_edges_from(edges)
# 3. Clustering coefficient (how connected are your neighbors?)
clustering = nx.clustering(G)
cc = torch.tensor([clustering.get(i, 0) for i in range(data.num_nodes)])
# 4. Triangle count
triangles = nx.triangles(G)
tri = torch.tensor([triangles.get(i, 0) for i in range(data.num_nodes)])
# 5. PageRank centrality
pagerank = nx.pagerank(G)
pr = torch.tensor([pagerank.get(i, 0) for i in range(data.num_nodes)])
# Stack all structural features
struct_features = torch.stack([deg, cc, tri.float(), pr], dim=1)
# Concatenate with original node features
data.x = torch.cat([data.x, struct_features], dim=-1)Compute structural features once as preprocessing. Concatenate with node features. The GNN now has explicit access to local topology information.
Concrete example: fraud detection with triangle counts
A payment network:
- 500,000 account nodes, 10M transaction edges
- Fraud rings form tight triangles: A pays B, B pays C, C pays A
- Without triangle count: GCNConv cannot distinguish a node in a triangle from a node in a chain (both have degree 2)
- With triangle count: nodes in fraud rings have triangle count > 0, chain nodes have triangle count = 0
Adding triangle count as a structural encoding gives the fraud GNN a direct signal for ring membership. This single feature can improve fraud detection AUROC by 3-5 points on graphs with ring-based fraud patterns.
Limitations and what comes next
- Computation cost for complex features: Degree is O(|E|). Triangle counting is O(|E|^1.5). Betweenness centrality is O(n * |E|). For very large enterprise graphs, approximate algorithms are needed for expensive metrics.
- Feature selection: Which structural features matter depends on the task. Triangle counts help for fraud but may not help for churn. Domain knowledge or feature importance analysis guides selection.
- Static features: Structural encodings are computed once on a graph snapshot. In dynamic enterprise graphs, they must be periodically recomputed as the graph evolves.