Edge features are attribute vectors attached to graph edges that encode relationship properties like transaction amounts, timestamps, interaction types, and weights, and they can be incorporated into GNN message passing to produce richer, more informative node representations. In enterprise relational databases, edges often carry as much information as nodes. A customer-product edge has a purchase amount, date, channel, and return status. Ignoring these features means ignoring half the predictive signal in the data. Edge-aware GNN layers incorporate this information directly into the message computation.
Why it matters for enterprise data
Consider a fraud detection model on a transaction graph. Without edge features, the model knows “Customer A transacted with Merchant B.” With edge features, it knows “Customer A spent $15,000 at Merchant B at 3:42 AM via wire transfer.” The edge features (amount, time, channel) transform a generic connection into a rich, informative signal.
Common enterprise edge features:
- Transaction amount: $10 grocery purchase vs. $50,000 wire transfer
- Timestamp: recent transaction vs. year-old transaction
- Interaction type: purchased, returned, viewed, wishlisted
- Frequency: one-time connection vs. recurring relationship
- Edge weight: call duration, shipping cost, contract value
How edge features integrate into message passing
Approach 1: Message modification
Concatenate edge features with the neighbor's node features before computing the message. The message function sees both who the neighbor is and what the relationship looks like.
Approach 2: Message modulation
Use edge features to gate or scale the message. A high-value transaction edge amplifies the message. An old transaction edge dampens it. The edge features act as a learned filter.
Approach 3: Attention modification
In attention-based layers (GATConv, TransformerConv), edge features modify the attention score. A recent, high-value transaction gets higher attention than an old, small one.
import torch
import torch.nn.functional as F
from torch_geometric.nn import TransformerConv, GINEConv
from torch.nn import Sequential, Linear, ReLU
# TransformerConv: edge features in attention computation
class EdgeAwareGNN(torch.nn.Module):
def __init__(self, node_dim, edge_dim, hidden_dim, out_dim):
super().__init__()
# TransformerConv natively supports edge_attr
self.conv1 = TransformerConv(node_dim, hidden_dim, edge_dim=edge_dim)
self.conv2 = TransformerConv(hidden_dim, out_dim, edge_dim=edge_dim)
def forward(self, x, edge_index, edge_attr):
x = F.relu(self.conv1(x, edge_index, edge_attr))
x = self.conv2(x, edge_index, edge_attr)
return x
# GINEConv: edge features added to messages (maximum expressiveness)
nn = Sequential(Linear(64, 64), ReLU(), Linear(64, 64))
gine = GINEConv(nn)
# forward: gine(x, edge_index, edge_attr)
# Prepare edge features
# data.edge_attr = [num_edges, edge_dim]
# e.g., [amount_normalized, days_since, one_hot_channel]TransformerConv uses edge features in the attention computation. GINEConv adds edge features to messages for maximum expressiveness. Both accept edge_attr directly.
Concrete example: churn prediction with interaction features
A subscription service wants to predict churn. The graph has:
- Customer nodes: features = [tenure, plan_tier]
- Content nodes: features = [genre, release_date, rating]
- Edges (customer watched content): features = [watch_duration, completion_rate, days_ago, device_type]
Without edge features, the model knows a customer watched a movie. With edge features, it knows the customer watched 15 minutes of a 2-hour movie on mobile 3 days ago (low engagement signal). This distinction is critical: a customer who consistently watches only 10% of content is far more likely to churn than one who completes 90%.
The edge-aware model learns that low completion_rate edges are strong churn indicators, while high completion_rate edges from recent days_ago are retention signals.
Limitations and what comes next
- Not all layers support edge features: GCNConv and basic SAGEConv do not. You must use TransformerConv, GINEConv, NNConv, or custom MessagePassing layers. This limits architecture choices.
- Edge feature dimensionality: High-dimensional edge features increase memory proportional to the number of edges. For enterprise graphs with billions of edges, edge feature storage can be the memory bottleneck.
- Categorical encoding: Edge types (purchased, returned, viewed) need encoding. One-hot encoding for many types is memory-intensive. Learned edge type embeddings are more efficient.
KumoRFM's Relational Graph Transformer uses edge features extensively, encoding both the relationship type and temporal information for each edge in the relational graph. This is part of how it achieves 81.14 fine-tuned AUROC on RelBench.