Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Layer8 min read

RGATConv: Attention Meets Relational Graph Convolution

RGATConv combines the best of RGCNConv and GATConv: relation-specific weight matrices for heterogeneous edge types plus learned attention weights within each type. It is the natural upgrade when RGCN's equal-neighbor-treatment limits performance on your heterogeneous graph.

PyTorch Geometric

TL;DR

  • 1RGATConv = RGCNConv + GATConv. Relation-specific weight matrices handle different edge types, while attention learns which specific neighbors within each type matter most.
  • 2Natural upgrade from RGCNConv when not all edges of the same relation type carry equal signal (e.g., some purchases are more predictive than others).
  • 3Simpler than HGTConv: uses GAT-style attention per relation type rather than full transformer attention with type-specific projections.
  • 4Supports basis decomposition for many relation types. Practical for knowledge graphs with hundreds of edge types.
  • 5KumoRFM combines relation-specific attention with temporal encoding and scalable sampling, extending RGATConv's approach to production enterprise data.

Original Paper

Relational Graph Attention Networks

Busbridge et al. (2019). arXiv 2019

Read paper →

What RGATConv does

RGATConv applies attention-weighted aggregation with relation-specific transformations:

  1. For each relation type r, transform neighbor features using W_r
  2. Compute attention scores within each relation type (GAT-style)
  3. Aggregate attention-weighted messages per relation type
  4. Combine across all relation types

The math (simplified)

RGATConv formula
# Per-relation attention (GAT-style within each type)
e_ij^r = LeakyReLU( a_r^T · [W_r · h_i || W_r · h_j] )
alpha_ij^r = softmax_j(e_ij^r)  # normalize within relation r

# Per-relation aggregation
z_i^r = Σ_{j in N_r(i)} alpha_ij^r · W_r · h_j

# Combine across relations
h_i' = W_0 · h_i + Σ_r z_i^r

Where:
  W_r   = relation-specific weight matrix
  a_r   = relation-specific attention vector
  N_r(i) = neighbors of i via relation r

Each relation type has its own weight matrix AND attention parameters. This is more expressive than RGCN (no attention) and more type-aware than GAT (no relation-specific weights).

PyG implementation

rgat_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import RGATConv

class RGAT(torch.nn.Module):
    def __init__(self, in_channels, hidden, out_channels,
                 num_relations, heads=4):
        super().__init__()
        self.conv1 = RGATConv(in_channels, hidden,
                              num_relations=num_relations, heads=heads)
        self.conv2 = RGATConv(hidden * heads, out_channels,
                              num_relations=num_relations, heads=1,
                              concat=False)

    def forward(self, x, edge_index, edge_type):
        x = F.elu(self.conv1(x, edge_index, edge_type))
        x = self.conv2(x, edge_index, edge_type)
        return x

model = RGAT(in_channels=64, hidden=32, out_channels=num_classes,
             num_relations=5, heads=4)

API is similar to RGCNConv (takes edge_type) combined with GATConv (takes heads, concat). A natural combination of both interfaces.

When to use RGATConv

  • Heterogeneous graphs with variable neighbor importance. When you need both type-specific transformations and attention over individual neighbors.
  • Knowledge graph completion. Link prediction where different relation types need different treatment and some entities are more informative than others.
  • Fraud detection on relational data. Transaction types differ in semantics (RGCN aspect), and individual transactions differ in suspiciousness (GAT aspect).

When not to use RGATConv

  • Homogeneous graphs. Use GATConv directly. No relation types to specialize for.
  • When you need full transformer attention across types. HGTConv provides richer cross-type attention patterns. RGATConv attends within each type independently.

How KumoRFM builds on this

RGATConv combines two important ideas: relation-specific transformations and attention. KumoRFM extends both:

  • Cross-type attention (like HGTConv) that considers all relation types jointly
  • Temporal attention that weights recent interactions higher within each relation type
  • Scalable to production with sampling that respects both type and temporal constraints

Frequently asked questions

What is RGATConv in PyTorch Geometric?

RGATConv implements Relational Graph Attention Networks from Busbridge et al. (2019). It combines RGCNConv's relation-specific weight matrices with GATConv's attention mechanism, learning both type-specific transformations and attention weights over neighbors in heterogeneous graphs.

How does RGATConv differ from RGCNConv?

RGCNConv uses separate weight matrices per relation type but treats all neighbors of the same type equally. RGATConv adds attention within each relation type, so the model can learn that some 'purchase' edges are more important than others based on the node features involved.

How does RGATConv differ from HGTConv?

Both combine typed transformations with attention. RGATConv applies GAT-style attention within each relation type. HGTConv uses transformer-style attention with type-specific key/query/value projections. HGTConv is generally more expressive but more complex. RGATConv is simpler to understand and implement.

When should I use RGATConv?

Use RGATConv when you have a heterogeneous graph with typed edges and want attention over neighbors. It is a natural upgrade from RGCNConv when not all edges of the same type carry equal signal. Common use cases: knowledge graph completion, fraud detection in multi-relational networks.

Does RGATConv support basis decomposition?

Yes. Like RGCNConv, RGATConv supports basis decomposition to reduce parameters when you have many relation types. This makes it practical for knowledge graphs with hundreds of relation types.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.