Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Relation Type Encoding: Representing Edge Types in Heterogeneous Graphs

A 'purchased' edge and a 'returned' edge connect the same node types but carry opposite signals. Relation type encoding tells the GNN which is which, enabling it to treat different relationships with different learned transformations.

PyTorch Geometric

TL;DR

  • 1Relation type encoding assigns distinct learned representations to each edge type in a heterogeneous graph, enabling the GNN to apply different transformations per relationship type.
  • 2Three approaches: separate weight matrices per relation (R-GCN), relation-specific attention (R-GAT), and additive relation embeddings added to messages before aggregation.
  • 3Enterprise databases naturally produce many relation types: purchased, viewed, returned, reviewed, wishlisted. Each foreign key with a semantic qualifier becomes a distinct edge type.
  • 4In PyG, use HeteroData with typed edges and HeteroConv to dispatch different GNN layers per edge type, or use a relation-aware layer like RGCNConv.
  • 5Proper relation encoding typically improves heterogeneous graph tasks by 3-8% AUROC compared to collapsing all edge types into one.

Relation type encoding represents different edge types in a heterogeneous graph so the GNN can apply distinct learned transformations per relationship. In an enterprise graph, a customer “purchased” a product, “viewed” a product, and “returned” a product are three different relationships. Without relation type encoding, the GNN treats them identically. With it, the model learns that purchases are positive signals, views are weak signals, and returns are negative signals.

Why edge types matter

Consider a fraud detection graph with three edge types between accounts: “transferred_to,” “shared_device,” and “same_merchant.” Each carries a different fraud signal:

  • transferred_to: Direct money flow. Strong fraud signal when circular.
  • shared_device: Same phone or IP address. Moderate signal for organized fraud rings.
  • same_merchant: Common merchant. Weak signal, high-frequency but low specificity.

A GNN that treats all three as generic “connected” edges cannot learn these distinctions. The aggregated message from a transferred_to neighbor should be weighted differently than one from a same_merchant neighbor.

Encoding approaches

Approach 1: Separate weight matrices (R-GCN)

The Relational Graph Convolutional Network uses a different learned weight matrix for each relation type. Messages from “purchased” edges are transformed by W_purchased, messages from “returned” edges by W_returned, and so on.

rgcn_relation_encoding.py
from torch_geometric.nn import RGCNConv

# R-GCN: separate weight matrix per relation type
conv = RGCNConv(
    in_channels=64,
    out_channels=64,
    num_relations=5,       # purchased, viewed, returned, reviewed, wishlisted
    num_bases=3,           # basis decomposition to reduce parameters
)

# Forward: edge_type is an integer tensor mapping each edge to its type
out = conv(x, edge_index, edge_type)
# Internally: for each relation r, compute W_r @ x_neighbors
# then aggregate across all relation types

R-GCN uses basis decomposition (num_bases) to share parameters across relations, preventing parameter explosion with many edge types.

Approach 2: Relation-specific attention

Instead of separate weight matrices, use a single weight matrix but compute attention scores that depend on the relation type. The attention mechanism learns how much weight to give each neighbor based on both its features and the relationship type.

Approach 3: Additive relation embeddings

The simplest approach: maintain a learned embedding vector for each relation type. Before aggregation, add the relation embedding to the neighbor's message. This is computationally cheap and works surprisingly well:

additive_relation.py
import torch
import torch.nn as nn

class RelationAwareMessage(nn.Module):
    def __init__(self, hidden_dim, num_relations):
        super().__init__()
        self.rel_embed = nn.Embedding(num_relations, hidden_dim)

    def compute_message(self, x_neighbor, edge_type):
        # Add relation embedding to neighbor features
        return x_neighbor + self.rel_embed(edge_type)

Additive relation embeddings: one line changes a homogeneous GNN into a relation-aware one.

Enterprise databases as multi-relational graphs

When converting an enterprise database to a graph, each foreign key relationship becomes an edge type. A typical e-commerce database produces:

  • customer → order (placed_by)
  • order → product (contains)
  • customer → product (viewed, purchased, returned, reviewed, wishlisted)
  • product → category (belongs_to)
  • customer → customer (referred)

That is 8+ relation types from a simple 4-table database. A financial services database with accounts, transactions, merchants, devices, and alerts easily produces 15-20 relation types. Proper encoding of each is essential for the GNN to distinguish signal from noise.

PyG implementation with HeteroData

PyTorch Geometric's HeteroData natively supports typed edges. Each edge type is a triplet: (source_node_type, relation_type, target_node_type).

hetero_data_edges.py
from torch_geometric.data import HeteroData
from torch_geometric.nn import HeteroConv, SAGEConv

data = HeteroData()
data['customer'].x = customer_features
data['product'].x = product_features

# Different edge types
data['customer', 'purchased', 'product'].edge_index = purchase_edges
data['customer', 'viewed', 'product'].edge_index = view_edges
data['customer', 'returned', 'product'].edge_index = return_edges

# HeteroConv: separate layer per edge type
conv = HeteroConv({
    ('customer', 'purchased', 'product'): SAGEConv(64, 64),
    ('customer', 'viewed', 'product'): SAGEConv(64, 64),
    ('customer', 'returned', 'product'): SAGEConv(64, 64),
})

HeteroConv dispatches a separate GNN layer per edge type. Each relation gets its own learned transformation.

Frequently asked questions

What is relation type encoding?

Relation type encoding is the process of representing different edge types (relationships) in a heterogeneous graph so the GNN can distinguish between them. In a retail graph, 'purchased,' 'viewed,' and 'returned' are all customer-to-product edges but carry very different semantic meanings.

Why can't you just use one edge type for everything?

Collapsing all edge types into one loses critical information. A 'returned' edge is the opposite signal of a 'purchased' edge for recommendation. A 'reported' edge is a different signal than a 'transferred_to' edge for fraud detection. The GNN must know the relationship type to aggregate messages correctly.

How does PyG handle multiple edge types?

PyG uses HeteroData with typed edge stores. Each edge type is a triplet (source_type, relation, target_type). Heterogeneous GNN layers like HeteroConv apply separate transformations per edge type, or you can use relation-specific weight matrices in a single layer.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.