Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Layer10 min read

HGTConv: Transformer Attention Meets Heterogeneous Graphs

HGTConv is the first layer that combines transformer-style attention with full heterogeneous type awareness. It learns that 'User purchases Product' needs different attention than 'User reviews Product,' making it the state-of-the-art for enterprise data with multiple entity and relationship types.

PyTorch Geometric

TL;DR

  • 1HGTConv uses type-specific key, query, and value projections: different parameters for each (source_type, edge_type, target_type) triplet. This is transformer attention adapted for heterogeneous graphs.
  • 2More expressive than RGCNConv (no attention) and more type-aware than TransformerConv (homogeneous). The right choice for complex multi-type graphs.
  • 3Supports mini-batch training via PyG's HGTLoader for scalable type-aware neighbor sampling.
  • 4Best for enterprise data where multiple entity types interact through different relationship types: e-commerce, financial networks, social platforms.

Original Paper

Heterogeneous Graph Transformer

Hu et al. (2020). WWW 2020

Read paper →

What HGTConv does

HGTConv applies transformer attention with type-specific projections. For each target node:

  1. Project the target node's features into a query using a type-specific W_Q
  2. For each neighbor, project features into key and value using source-type-specific W_K and W_V
  3. Compute attention scores using an edge-type-specific attention function
  4. Aggregate neighbor values weighted by attention

The key difference from TransformerConv: every projection depends on the types involved. A “User” query uses different parameters than a “Product” query. A “purchases” edge computes attention differently than a “reviews” edge.

The math (simplified)

HGTConv formula
# Type-specific projections
Q_i     = W_Q[type(i)] · h_i          # query depends on target type
K_j     = W_K[type(j)] · h_j          # key depends on source type
V_j     = W_V[type(j)] · h_j          # value depends on source type

# Type-specific attention
alpha_ij = softmax_j(
    Q_i^T · W_ATT[type(e_ij)] · K_j / sqrt(d)
)

# Weighted aggregation with type-specific message
h_i' = Σ_j alpha_ij · W_MSG[type(e_ij)] · V_j

Where:
  type(i)    = node type of i (User, Product, etc.)
  type(e_ij) = edge type connecting i and j (purchases, reviews, etc.)
  W_ATT, W_MSG = edge-type-specific matrices

Every projection and attention computation is conditioned on node types and edge types. This is the most type-aware attention mechanism in standard PyG layers.

PyG implementation

hgt_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import HGTConv, Linear

class HGT(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels, num_heads,
                 num_layers, metadata):
        super().__init__()
        # metadata = (node_types, edge_types) from HeteroData
        self.lin_dict = torch.nn.ModuleDict()
        for node_type in metadata[0]:
            self.lin_dict[node_type] = Linear(-1, hidden_channels)

        self.convs = torch.nn.ModuleList()
        for _ in range(num_layers):
            self.convs.append(HGTConv(hidden_channels, hidden_channels,
                                       metadata, heads=num_heads))
        self.out = Linear(hidden_channels, out_channels)

    def forward(self, x_dict, edge_index_dict):
        # Project each node type to shared hidden dim
        x_dict = {k: self.lin_dict[k](v) for k, v in x_dict.items()}
        for conv in self.convs:
            x_dict = conv(x_dict, edge_index_dict)
        return self.out(x_dict['target_node_type'])

# With PyG HeteroData
from torch_geometric.data import HeteroData
data = HeteroData()
data['user'].x = user_features
data['product'].x = product_features
data['user', 'purchases', 'product'].edge_index = purchase_edges
data['user', 'reviews', 'product'].edge_index = review_edges

model = HGT(hidden_channels=64, out_channels=num_classes,
            num_heads=4, num_layers=2, metadata=data.metadata())

HGTConv works with PyG's HeteroData format. The metadata tuple (node_types, edge_types) is extracted automatically. Each node type can have different input dimensions.

When to use HGTConv

  • Enterprise relational data. Databases with multiple tables (customers, orders, products, merchants) connected by different relationships. HGTConv is designed exactly for this structure.
  • Complex heterogeneous graphs. Graphs with 3+ node types and 3+ edge types where different relationships have fundamentally different semantics.
  • When you need both types and attention. RGCNConv has types but no attention. GATConv has attention but no types. HGTConv has both.
  • Academic knowledge graph tasks. Node classification and link prediction on heterogeneous academic graphs (papers, authors, venues, topics).

When not to use HGTConv

  • Homogeneous graphs. If all nodes and edges are the same type, HGTConv adds unnecessary complexity. Use GATConv or TransformerConv instead.
  • Many types with limited data per type. With 100+ node types, the per-type projections may have too few training examples per parameter. Consider HeteroConv with shared base layers.

Frequently asked questions

What is HGTConv in PyTorch Geometric?

HGTConv implements the Heterogeneous Graph Transformer from Hu et al. (2020). It uses type-specific key, query, and value projections so each node type and edge type gets its own attention parameters. This lets the model learn fundamentally different attention patterns for different relationship types in a heterogeneous graph.

How does HGTConv differ from RGCNConv?

RGCNConv uses separate weight matrices per edge type but no attention mechanism. HGTConv adds transformer-style attention with type-specific projections: the query depends on the target node type, the key depends on the source node type, and the attention function depends on the edge type. This is both more expressive and more flexible.

What types does HGTConv support?

HGTConv supports both multiple node types and multiple edge types. Each combination of (source_type, edge_type, target_type) gets its own attention parameters. For example, (User, purchases, Product) uses different attention than (User, reviews, Product).

When should I use HGTConv vs HeteroConv?

HGTConv is a single integrated layer designed specifically for heterogeneous graphs with joint type-specific attention. HeteroConv is a generic wrapper that applies any base layer (GCNConv, GATConv, SAGEConv) per edge type. Use HGTConv when you want attention that is aware of all types simultaneously. Use HeteroConv when you want to reuse existing homogeneous layers.

Can HGTConv scale to large heterogeneous graphs?

HGTConv supports mini-batch training via PyG's HGTLoader, which performs type-aware neighbor sampling. This bounds computation per node regardless of graph size. However, the per-type projections increase memory with the number of types, so graphs with hundreds of node/edge types may need HeteroConv with simpler base layers.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.