Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide8 min read

Edge Features: Using Edge Attributes in Message Passing

Enterprise data lives on edges as much as on nodes. Transaction amounts, timestamps, relationship types, and weights are all edge features. Edge-aware GNNs incorporate this information into message passing for richer predictions.

PyTorch Geometric

TL;DR

  • 1Edge features are attribute vectors on graph edges: transaction amounts, timestamps, relationship types, weights. They carry as much predictive signal as node features in enterprise data.
  • 2Standard GCNConv ignores edge features. Edge-aware layers (GATConv, TransformerConv, NNConv, GINEConv) incorporate them into message computation.
  • 3Three integration approaches: concatenate with neighbor features, modulate messages (multiply/gate), or modify attention scores. TransformerConv uses edge features in attention and is the most flexible.
  • 4For enterprise data: a $10,000 transaction edge carries different signal than a $10 edge. A recent edge matters more than a year-old edge. Edge features capture these distinctions.
  • 5In PyG: store edge features in data.edge_attr. Pass to layers that accept edge_attr parameter. Encode categorical features as embeddings; normalize continuous features.

Edge features are attribute vectors attached to graph edges that encode relationship properties like transaction amounts, timestamps, interaction types, and weights, and they can be incorporated into GNN message passing to produce richer, more informative node representations. In enterprise relational databases, edges often carry as much information as nodes. A customer-product edge has a purchase amount, date, channel, and return status. Ignoring these features means ignoring half the predictive signal in the data. Edge-aware GNN layers incorporate this information directly into the message computation.

Why it matters for enterprise data

Consider a fraud detection model on a transaction graph. Without edge features, the model knows “Customer A transacted with Merchant B.” With edge features, it knows “Customer A spent $15,000 at Merchant B at 3:42 AM via wire transfer.” The edge features (amount, time, channel) transform a generic connection into a rich, informative signal.

Common enterprise edge features:

  • Transaction amount: $10 grocery purchase vs. $50,000 wire transfer
  • Timestamp: recent transaction vs. year-old transaction
  • Interaction type: purchased, returned, viewed, wishlisted
  • Frequency: one-time connection vs. recurring relationship
  • Edge weight: call duration, shipping cost, contract value

How edge features integrate into message passing

Approach 1: Message modification

Concatenate edge features with the neighbor's node features before computing the message. The message function sees both who the neighbor is and what the relationship looks like.

Approach 2: Message modulation

Use edge features to gate or scale the message. A high-value transaction edge amplifies the message. An old transaction edge dampens it. The edge features act as a learned filter.

Approach 3: Attention modification

In attention-based layers (GATConv, TransformerConv), edge features modify the attention score. A recent, high-value transaction gets higher attention than an old, small one.

edge_features_pyg.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import TransformerConv, GINEConv
from torch.nn import Sequential, Linear, ReLU

# TransformerConv: edge features in attention computation
class EdgeAwareGNN(torch.nn.Module):
    def __init__(self, node_dim, edge_dim, hidden_dim, out_dim):
        super().__init__()
        # TransformerConv natively supports edge_attr
        self.conv1 = TransformerConv(node_dim, hidden_dim, edge_dim=edge_dim)
        self.conv2 = TransformerConv(hidden_dim, out_dim, edge_dim=edge_dim)

    def forward(self, x, edge_index, edge_attr):
        x = F.relu(self.conv1(x, edge_index, edge_attr))
        x = self.conv2(x, edge_index, edge_attr)
        return x

# GINEConv: edge features added to messages (maximum expressiveness)
nn = Sequential(Linear(64, 64), ReLU(), Linear(64, 64))
gine = GINEConv(nn)
# forward: gine(x, edge_index, edge_attr)

# Prepare edge features
# data.edge_attr = [num_edges, edge_dim]
# e.g., [amount_normalized, days_since, one_hot_channel]

TransformerConv uses edge features in the attention computation. GINEConv adds edge features to messages for maximum expressiveness. Both accept edge_attr directly.

Concrete example: churn prediction with interaction features

A subscription service wants to predict churn. The graph has:

  • Customer nodes: features = [tenure, plan_tier]
  • Content nodes: features = [genre, release_date, rating]
  • Edges (customer watched content): features = [watch_duration, completion_rate, days_ago, device_type]

Without edge features, the model knows a customer watched a movie. With edge features, it knows the customer watched 15 minutes of a 2-hour movie on mobile 3 days ago (low engagement signal). This distinction is critical: a customer who consistently watches only 10% of content is far more likely to churn than one who completes 90%.

The edge-aware model learns that low completion_rate edges are strong churn indicators, while high completion_rate edges from recent days_ago are retention signals.

Limitations and what comes next

  1. Not all layers support edge features: GCNConv and basic SAGEConv do not. You must use TransformerConv, GINEConv, NNConv, or custom MessagePassing layers. This limits architecture choices.
  2. Edge feature dimensionality: High-dimensional edge features increase memory proportional to the number of edges. For enterprise graphs with billions of edges, edge feature storage can be the memory bottleneck.
  3. Categorical encoding: Edge types (purchased, returned, viewed) need encoding. One-hot encoding for many types is memory-intensive. Learned edge type embeddings are more efficient.

KumoRFM's Relational Graph Transformer uses edge features extensively, encoding both the relationship type and temporal information for each edge in the relational graph. This is part of how it achieves 81.14 fine-tuned AUROC on RelBench.

Frequently asked questions

What are edge features in GNNs?

Edge features are attribute vectors attached to edges in a graph. In enterprise data, edge features include transaction amounts, timestamps, relationship types (purchased, returned, viewed), interaction frequency, and edge weights. Standard GNN layers like GCNConv ignore edge features, using only the graph topology. Edge-aware layers (NNConv, GATConv with edge_attr, TransformerConv) incorporate edge features into the message computation.

Why are edge features important for enterprise data?

In enterprise relational databases, edges carry as much information as nodes. A customer-product edge has an amount, a date, a channel, and a discount flag. A supplier-manufacturer edge has a lead time, reliability score, and contract terms. Ignoring these features means ignoring half the information in the database. Edge-aware GNNs use this information to compute more informative messages.

Which PyG layers support edge features?

GATConv (attention modified by edge features), GATv2Conv, TransformerConv (edge features in attention computation), NNConv (edge features define a per-edge neural network), GINEConv (GIN with edge features added to messages), and GeneralConv (configurable edge feature support). GCNConv and basic SAGEConv do not support edge features natively.

How are edge features used in message passing?

Three approaches: (1) Concatenate edge features with neighbor features before computing the message. (2) Use edge features to modulate the message (multiply or gate). (3) Use edge features in the attention computation (TransformerConv). The first is simplest. The third is most powerful. All allow the model to learn edge-dependent message functions.

How do I prepare edge features for a GNN?

In PyG, edge features are stored in data.edge_attr with shape [num_edges, num_edge_features]. Each edge_attr[i] corresponds to the edge defined by edge_index[:, i]. Encode categorical edge features (relationship type) as one-hot or learned embeddings. Normalize continuous features (amounts, timestamps). Pass edge_attr to the GNN layer's forward method.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.