What are edge features in GNNs?

Edge features are attribute vectors attached to edges in a graph. In enterprise data, edge features include transaction amounts, timestamps, relationship types (purchased, returned, viewed), interaction frequency, and edge weights. Standard GNN layers like GCNConv ignore edge features, using only the graph topology. Edge-aware layers (NNConv, GATConv with edge_attr, TransformerConv) incorporate edge features into the message computation.

Why are edge features important for enterprise data?

In enterprise relational databases, edges carry as much information as nodes. A customer-product edge has an amount, a date, a channel, and a discount flag. A supplier-manufacturer edge has a lead time, reliability score, and contract terms. Ignoring these features means ignoring half the information in the database. Edge-aware GNNs use this information to compute more informative messages.

Which PyG layers support edge features?

GATConv (attention modified by edge features), GATv2Conv, TransformerConv (edge features in attention computation), NNConv (edge features define a per-edge neural network), GINEConv (GIN with edge features added to messages), and GeneralConv (configurable edge feature support). GCNConv and basic SAGEConv do not support edge features natively.

How are edge features used in message passing?

Three approaches: (1) Concatenate edge features with neighbor features before computing the message. (2) Use edge features to modulate the message (multiply or gate). (3) Use edge features in the attention computation (TransformerConv). The first is simplest. The third is most powerful. All allow the model to learn edge-dependent message functions.

How do I prepare edge features for a GNN?

In PyG, edge features are stored in data.edge_attr with shape [num_edges, num_edge_features]. Each edge_attr[i] corresponds to the edge defined by edge_index[:, i]. Encode categorical edge features (relationship type) as one-hot or learned embeddings. Normalize continuous features (amounts, timestamps). Pass edge_attr to the GNN layer's forward method.

Edge Features in GNNs: Using Edge Attributes in Message Passing | Kumo.ai

Edge features are attribute vectors attached to graph edges that encode relationship properties like transaction amounts, timestamps, interaction types, and weights, and they can be incorporated into GNN message passing to produce richer, more informative node representations. In enterprise relational databases, edges often carry as much information as nodes. A customer-product edge has a purchase amount, date, channel, and return status. Ignoring these features means ignoring half the predictive signal in the data. Edge-aware GNN layers incorporate this information directly into the message computation.

Why it matters for enterprise data

Consider a fraud detection model on a transaction graph. Without edge features, the model knows “Customer A transacted with Merchant B.” With edge features, it knows “Customer A spent $15,000 at Merchant B at 3:42 AM via wire transfer.” The edge features (amount, time, channel) transform a generic connection into a rich, informative signal.

Common enterprise edge features:

Transaction amount: $10 grocery purchase vs. $50,000 wire transfer
Timestamp: recent transaction vs. year-old transaction
Interaction type: purchased, returned, viewed, wishlisted
Frequency: one-time connection vs. recurring relationship
Edge weight: call duration, shipping cost, contract value

How edge features integrate into message passing

Approach 1: Message modification

Concatenate edge features with the neighbor's node features before computing the message. The message function sees both who the neighbor is and what the relationship looks like.

Approach 2: Message modulation

Use edge features to gate or scale the message. A high-value transaction edge amplifies the message. An old transaction edge dampens it. The edge features act as a learned filter.

Approach 3: Attention modification

In attention-based layers (GATConv, TransformerConv), edge features modify the attention score. A recent, high-value transaction gets higher attention than an old, small one.

edge_features_pyg.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import TransformerConv, GINEConv
from torch.nn import Sequential, Linear, ReLU

# TransformerConv: edge features in attention computation
class EdgeAwareGNN(torch.nn.Module):
    def __init__(self, node_dim, edge_dim, hidden_dim, out_dim):
        super().__init__()
        # TransformerConv natively supports edge_attr
        self.conv1 = TransformerConv(node_dim, hidden_dim, edge_dim=edge_dim)
        self.conv2 = TransformerConv(hidden_dim, out_dim, edge_dim=edge_dim)

    def forward(self, x, edge_index, edge_attr):
        x = F.relu(self.conv1(x, edge_index, edge_attr))
        x = self.conv2(x, edge_index, edge_attr)
        return x

# GINEConv: edge features added to messages (maximum expressiveness)
nn = Sequential(Linear(64, 64), ReLU(), Linear(64, 64))
gine = GINEConv(nn)
# forward: gine(x, edge_index, edge_attr)

# Prepare edge features
# data.edge_attr = [num_edges, edge_dim]
# e.g., [amount_normalized, days_since, one_hot_channel]

TransformerConv uses edge features in the attention computation. GINEConv adds edge features to messages for maximum expressiveness. Both accept edge_attr directly.

Concrete example: churn prediction with interaction features

A subscription service wants to predict churn. The graph has:

Customer nodes: features = [tenure, plan_tier]
Content nodes: features = [genre, release_date, rating]
Edges (customer watched content): features = [watch_duration, completion_rate, days_ago, device_type]

Without edge features, the model knows a customer watched a movie. With edge features, it knows the customer watched 15 minutes of a 2-hour movie on mobile 3 days ago (low engagement signal). This distinction is critical: a customer who consistently watches only 10% of content is far more likely to churn than one who completes 90%.

The edge-aware model learns that low completion_rate edges are strong churn indicators, while high completion_rate edges from recent days_ago are retention signals.

Limitations and what comes next

Not all layers support edge features: GCNConv and basic SAGEConv do not. You must use TransformerConv, GINEConv, NNConv, or custom MessagePassing layers. This limits architecture choices.
Edge feature dimensionality: High-dimensional edge features increase memory proportional to the number of edges. For enterprise graphs with billions of edges, edge feature storage can be the memory bottleneck.
Categorical encoding: Edge types (purchased, returned, viewed) need encoding. One-hot encoding for many types is memory-intensive. Learned edge type embeddings are more efficient.

KumoRFM's Relational Graph Transformer uses edge features extensively, encoding both the relationship type and temporal information for each edge in the relational graph. This is part of how it achieves 81.14 fine-tuned AUROC on RelBench.

Key Takeaways

1Edge features are attribute vectors on edges: amounts, timestamps, types, weights. In enterprise data, they carry as much predictive signal as node features.
2Standard GCNConv ignores edge features. Use TransformerConv, GINEConv, NNConv, or GATConv with edge_attr for edge-aware message passing.
3Three integration methods: concatenate with neighbor features, modulate messages, or modify attention scores. TransformerConv (attention modification) is the most flexible.
4Enterprise impact: edge features transform 'customer bought product' into 'customer spent $15,000 at 3 AM via wire transfer.' The specificity drives fraud detection, churn prediction, and recommendation quality.
5In PyG: store in data.edge_attr as [num_edges, edge_dim]. Pass to layers accepting edge_attr. Encode categoricals as embeddings; normalize continuous features.

Edge Features: Using Edge Attributes in Message Passing