Node type encoding maps nodes of different types into a shared embedding space so they can participate in the same graph neural network computation. In a relational database converted to a graph, each table becomes a node type: customers, orders, products, merchants. Each type has a different set of features with different dimensionalities and semantics. Node type encoding bridges this gap.
The heterogeneity problem
A homogeneous GNN assumes all nodes share the same feature space. This works for citation networks (all nodes are papers) but fails for enterprise data. Consider:
- Customer: [age, income, location, tenure] (4 features, mixed types)
- Product: [price, weight, category, brand, description_embedding] (5 features + 128-dim embedding)
- Order: [amount, quantity, discount, timestamp] (4 features)
You cannot feed a 4-dimensional customer vector and a 133-dimensional product vector into the same linear layer. Node type encoding solves this by giving each type its own input projection into a common d-dimensional hidden space.
Encoding strategies
Strategy 1: Type-specific linear projections
The simplest approach: each node type gets its own linear layer that projects from its native feature dimension to the shared hidden dimension.
import torch.nn as nn
class NodeTypeEncoder(nn.Module):
def __init__(self, type_dims, hidden_dim):
super().__init__()
# One projection per node type
self.projections = nn.ModuleDict({
'customer': nn.Linear(4, hidden_dim),
'product': nn.Linear(133, hidden_dim),
'order': nn.Linear(4, hidden_dim),
'merchant': nn.Linear(12, hidden_dim),
})
def forward(self, x_dict):
return {
node_type: self.projections[node_type](x)
for node_type, x in x_dict.items()
}After this projection, all node types live in the same hidden_dim space and can participate in message passing.
Strategy 2: Type-specific MLPs
For richer encoding, use a small MLP per type instead of a single linear layer. This is useful when raw features need nonlinear transformations (e.g., log-scaling prices, handling categoricals).
Strategy 3: Shared projection + type embedding
A parameter-efficient alternative: use a shared projection layer for all types (padding shorter feature vectors to a common length) and add a learned type embedding that tells the model which type each node belongs to.
class SharedTypeEncoder(nn.Module):
def __init__(self, max_features, hidden_dim, num_types):
super().__init__()
self.shared_proj = nn.Linear(max_features, hidden_dim)
self.type_embed = nn.Embedding(num_types, hidden_dim)
def forward(self, x_padded, type_ids):
# x_padded: all features zero-padded to max_features
h = self.shared_proj(x_padded)
h = h + self.type_embed(type_ids) # add type signal
return hShared projection uses fewer parameters but requires padding. The type embedding compensates for lost type information.
Beyond feature projection
Node type encoding involves more than dimensionality alignment:
- Feature normalization: Different types have different value ranges. Customer age (0-100) and order amount ($0-$10,000) need separate normalization before projection.
- Missing feature handling: Some node types have sparse features. Products without reviews, customers without demographic data. Type-specific encoders can handle missingness differently per type.
- Categorical encoding: Some types are mostly categorical (product category, customer segment). The type-specific encoder can include embedding layers for categoricals before the projection.
In message passing
After node type encoding, all nodes live in the same d-dimensional space. Message passing then operates across types:
- Customer node sends its d-dim representation to connected order nodes
- Order node aggregates messages from its customer and product neighbors
- The aggregation combines information from different types in the shared space
Type-aware message passing layers (HGTConv, HeteroConv) can further apply type-specific transformations during aggregation, but the initial node type encoding is what makes cross-type message passing possible at all.