RGATConv: Relational Graph Attention Network Explained | PyG Guide

Original Paper

Relational Graph Attention Networks

Busbridge et al. (2019). arXiv 2019

Read paper →

What RGATConv does

RGATConv applies attention-weighted aggregation with relation-specific transformations:

For each relation type r, transform neighbor features using W_r
Compute attention scores within each relation type (GAT-style)
Aggregate attention-weighted messages per relation type
Combine across all relation types

The math (simplified)

RGATConv formula

# Per-relation attention (GAT-style within each type)
e_ij^r = LeakyReLU( a_r^T · [W_r · h_i || W_r · h_j] )
alpha_ij^r = softmax_j(e_ij^r)  # normalize within relation r

# Per-relation aggregation
z_i^r = Σ_{j in N_r(i)} alpha_ij^r · W_r · h_j

# Combine across relations
h_i' = W_0 · h_i + Σ_r z_i^r

Where:
  W_r   = relation-specific weight matrix
  a_r   = relation-specific attention vector
  N_r(i) = neighbors of i via relation r

Each relation type has its own weight matrix AND attention parameters. This is more expressive than RGCN (no attention) and more type-aware than GAT (no relation-specific weights).

PyG implementation

rgat_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import RGATConv

class RGAT(torch.nn.Module):
    def __init__(self, in_channels, hidden, out_channels,
                 num_relations, heads=4):
        super().__init__()
        self.conv1 = RGATConv(in_channels, hidden,
                              num_relations=num_relations, heads=heads)
        self.conv2 = RGATConv(hidden * heads, out_channels,
                              num_relations=num_relations, heads=1,
                              concat=False)

    def forward(self, x, edge_index, edge_type):
        x = F.elu(self.conv1(x, edge_index, edge_type))
        x = self.conv2(x, edge_index, edge_type)
        return x

model = RGAT(in_channels=64, hidden=32, out_channels=num_classes,
             num_relations=5, heads=4)

API is similar to RGCNConv (takes edge_type) combined with GATConv (takes heads, concat). A natural combination of both interfaces.

When to use RGATConv

Heterogeneous graphs with variable neighbor importance. When you need both type-specific transformations and attention over individual neighbors.
Knowledge graph completion. Link prediction where different relation types need different treatment and some entities are more informative than others.
Fraud detection on relational data. Transaction types differ in semantics (RGCN aspect), and individual transactions differ in suspiciousness (GAT aspect).

When not to use RGATConv

Homogeneous graphs. Use GATConv directly. No relation types to specialize for.
When you need full transformer attention across types. HGTConv provides richer cross-type attention patterns. RGATConv attends within each type independently.

How KumoRFM builds on this

RGATConv combines two important ideas: relation-specific transformations and attention. KumoRFM extends both:

Cross-type attention (like HGTConv) that considers all relation types jointly
Temporal attention that weights recent interactions higher within each relation type
Scalable to production with sampling that respects both type and temporal constraints

Frequently asked questions

What is RGATConv in PyTorch Geometric?

RGATConv implements Relational Graph Attention Networks from Busbridge et al. (2019). It combines RGCNConv's relation-specific weight matrices with GATConv's attention mechanism, learning both type-specific transformations and attention weights over neighbors in heterogeneous graphs.

How does RGATConv differ from RGCNConv?

RGCNConv uses separate weight matrices per relation type but treats all neighbors of the same type equally. RGATConv adds attention within each relation type, so the model can learn that some 'purchase' edges are more important than others based on the node features involved.

How does RGATConv differ from HGTConv?

Both combine typed transformations with attention. RGATConv applies GAT-style attention within each relation type. HGTConv uses transformer-style attention with type-specific key/query/value projections. HGTConv is generally more expressive but more complex. RGATConv is simpler to understand and implement.

When should I use RGATConv?

Use RGATConv when you have a heterogeneous graph with typed edges and want attention over neighbors. It is a natural upgrade from RGCNConv when not all edges of the same type carry equal signal. Common use cases: knowledge graph completion, fraud detection in multi-relational networks.

Does RGATConv support basis decomposition?

Yes. Like RGCNConv, RGATConv supports basis decomposition to reduce parameters when you have many relation types. This makes it practical for knowledge graphs with hundreds of relation types.

RGATConv: Attention Meets Relational Graph Convolution