What RGATConv does
RGATConv applies attention-weighted aggregation with relation-specific transformations:
- For each relation type r, transform neighbor features using W_r
- Compute attention scores within each relation type (GAT-style)
- Aggregate attention-weighted messages per relation type
- Combine across all relation types
The math (simplified)
# Per-relation attention (GAT-style within each type)
e_ij^r = LeakyReLU( a_r^T · [W_r · h_i || W_r · h_j] )
alpha_ij^r = softmax_j(e_ij^r) # normalize within relation r
# Per-relation aggregation
z_i^r = Σ_{j in N_r(i)} alpha_ij^r · W_r · h_j
# Combine across relations
h_i' = W_0 · h_i + Σ_r z_i^r
Where:
W_r = relation-specific weight matrix
a_r = relation-specific attention vector
N_r(i) = neighbors of i via relation rEach relation type has its own weight matrix AND attention parameters. This is more expressive than RGCN (no attention) and more type-aware than GAT (no relation-specific weights).
PyG implementation
import torch
import torch.nn.functional as F
from torch_geometric.nn import RGATConv
class RGAT(torch.nn.Module):
def __init__(self, in_channels, hidden, out_channels,
num_relations, heads=4):
super().__init__()
self.conv1 = RGATConv(in_channels, hidden,
num_relations=num_relations, heads=heads)
self.conv2 = RGATConv(hidden * heads, out_channels,
num_relations=num_relations, heads=1,
concat=False)
def forward(self, x, edge_index, edge_type):
x = F.elu(self.conv1(x, edge_index, edge_type))
x = self.conv2(x, edge_index, edge_type)
return x
model = RGAT(in_channels=64, hidden=32, out_channels=num_classes,
num_relations=5, heads=4)API is similar to RGCNConv (takes edge_type) combined with GATConv (takes heads, concat). A natural combination of both interfaces.
When to use RGATConv
- Heterogeneous graphs with variable neighbor importance. When you need both type-specific transformations and attention over individual neighbors.
- Knowledge graph completion. Link prediction where different relation types need different treatment and some entities are more informative than others.
- Fraud detection on relational data. Transaction types differ in semantics (RGCN aspect), and individual transactions differ in suspiciousness (GAT aspect).
When not to use RGATConv
- Homogeneous graphs. Use GATConv directly. No relation types to specialize for.
- When you need full transformer attention across types. HGTConv provides richer cross-type attention patterns. RGATConv attends within each type independently.
How KumoRFM builds on this
RGATConv combines two important ideas: relation-specific transformations and attention. KumoRFM extends both:
- Cross-type attention (like HGTConv) that considers all relation types jointly
- Temporal attention that weights recent interactions higher within each relation type
- Scalable to production with sampling that respects both type and temporal constraints