What is relation type encoding?

Relation type encoding is the process of representing different edge types (relationships) in a heterogeneous graph so the GNN can distinguish between them. In a retail graph, 'purchased,' 'viewed,' and 'returned' are all customer-to-product edges but carry very different semantic meanings.

Why can't you just use one edge type for everything?

Collapsing all edge types into one loses critical information. A 'returned' edge is the opposite signal of a 'purchased' edge for recommendation. A 'reported' edge is a different signal than a 'transferred_to' edge for fraud detection. The GNN must know the relationship type to aggregate messages correctly.

How does PyG handle multiple edge types?

PyG uses HeteroData with typed edge stores. Each edge type is a triplet (source_type, relation, target_type). Heterogeneous GNN layers like HeteroConv apply separate transformations per edge type, or you can use relation-specific weight matrices in a single layer.

Relation Type Encoding: Representing Edge Types in Heterogeneous Graphs | Kumo.ai

Relation type encoding represents different edge types in a heterogeneous graph so the GNN can apply distinct learned transformations per relationship. In an enterprise graph, a customer “purchased” a product, “viewed” a product, and “returned” a product are three different relationships. Without relation type encoding, the GNN treats them identically. With it, the model learns that purchases are positive signals, views are weak signals, and returns are negative signals.

Why edge types matter

Consider a fraud detection graph with three edge types between accounts: “transferred_to,” “shared_device,” and “same_merchant.” Each carries a different fraud signal:

transferred_to: Direct money flow. Strong fraud signal when circular.
shared_device: Same phone or IP address. Moderate signal for organized fraud rings.
same_merchant: Common merchant. Weak signal, high-frequency but low specificity.

A GNN that treats all three as generic “connected” edges cannot learn these distinctions. The aggregated message from a transferred_to neighbor should be weighted differently than one from a same_merchant neighbor.

Encoding approaches

Approach 1: Separate weight matrices (R-GCN)

The Relational Graph Convolutional Network uses a different learned weight matrix for each relation type. Messages from “purchased” edges are transformed by W_purchased, messages from “returned” edges by W_returned, and so on.

rgcn_relation_encoding.py

from torch_geometric.nn import RGCNConv

# R-GCN: separate weight matrix per relation type
conv = RGCNConv(
    in_channels=64,
    out_channels=64,
    num_relations=5,       # purchased, viewed, returned, reviewed, wishlisted
    num_bases=3,           # basis decomposition to reduce parameters
)

# Forward: edge_type is an integer tensor mapping each edge to its type
out = conv(x, edge_index, edge_type)
# Internally: for each relation r, compute W_r @ x_neighbors
# then aggregate across all relation types

R-GCN uses basis decomposition (num_bases) to share parameters across relations, preventing parameter explosion with many edge types.

Approach 2: Relation-specific attention

Instead of separate weight matrices, use a single weight matrix but compute attention scores that depend on the relation type. The attention mechanism learns how much weight to give each neighbor based on both its features and the relationship type.

Approach 3: Additive relation embeddings

The simplest approach: maintain a learned embedding vector for each relation type. Before aggregation, add the relation embedding to the neighbor's message. This is computationally cheap and works surprisingly well:

additive_relation.py

import torch
import torch.nn as nn

class RelationAwareMessage(nn.Module):
    def __init__(self, hidden_dim, num_relations):
        super().__init__()
        self.rel_embed = nn.Embedding(num_relations, hidden_dim)

    def compute_message(self, x_neighbor, edge_type):
        # Add relation embedding to neighbor features
        return x_neighbor + self.rel_embed(edge_type)

Additive relation embeddings: one line changes a homogeneous GNN into a relation-aware one.

Enterprise databases as multi-relational graphs

When converting an enterprise database to a graph, each foreign key relationship becomes an edge type. A typical e-commerce database produces:

customer → order (placed_by)
order → product (contains)
customer → product (viewed, purchased, returned, reviewed, wishlisted)
product → category (belongs_to)
customer → customer (referred)

That is 8+ relation types from a simple 4-table database. A financial services database with accounts, transactions, merchants, devices, and alerts easily produces 15-20 relation types. Proper encoding of each is essential for the GNN to distinguish signal from noise.

PyG implementation with HeteroData

PyTorch Geometric's HeteroData natively supports typed edges. Each edge type is a triplet: (source_node_type, relation_type, target_node_type).

hetero_data_edges.py

from torch_geometric.data import HeteroData
from torch_geometric.nn import HeteroConv, SAGEConv

data = HeteroData()
data['customer'].x = customer_features
data['product'].x = product_features

# Different edge types
data['customer', 'purchased', 'product'].edge_index = purchase_edges
data['customer', 'viewed', 'product'].edge_index = view_edges
data['customer', 'returned', 'product'].edge_index = return_edges

# HeteroConv: separate layer per edge type
conv = HeteroConv({
    ('customer', 'purchased', 'product'): SAGEConv(64, 64),
    ('customer', 'viewed', 'product'): SAGEConv(64, 64),
    ('customer', 'returned', 'product'): SAGEConv(64, 64),
})

HeteroConv dispatches a separate GNN layer per edge type. Each relation gets its own learned transformation.

Key Takeaways

1Relation type encoding gives GNNs the ability to distinguish between different edge types (purchased vs returned, transferred vs shared_device), applying distinct learned transformations per relationship.
2Three approaches: separate weight matrices per relation (R-GCN, most expressive), relation-specific attention (balanced), and additive relation embeddings (simplest, cheapest).
3Enterprise databases naturally produce many relation types. Each foreign key with semantic meaning becomes a distinct edge type. Typical databases have 5-20 relation types.
4In PyG, use HeteroData with typed edge stores and HeteroConv to dispatch separate layers per edge type. RGCNConv handles this in a single layer with basis decomposition.
5Proper relation encoding improves heterogeneous task performance by 3-8% AUROC. Collapsing edge types into one is the graph equivalent of dropping a critical column from a feature table.

Relation Type Encoding: Representing Edge Types in Heterogeneous Graphs