Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Homogeneous Graphs: Single Node and Edge Type Graphs

A homogeneous graph is the simplest graph structure: one type of node, one type of edge. It is the default data format in PyTorch Geometric and the starting point for most GNN research.

PyTorch Geometric

TL;DR

  • 1A homogeneous graph has one node type and one edge type. Every node shares the same feature space, and every edge represents the same relationship.
  • 2In PyG, homogeneous graphs use the Data object with x (node features), edge_index (connections), and optionally y (labels). This is the simplest and most common graph format.
  • 3Standard GNN layers like GCNConv, GATConv, and SAGEConv all operate on homogeneous graphs natively. No special configuration needed.
  • 4Social networks, citation graphs, and molecular graphs are natural homogeneous graphs. Enterprise relational databases are usually heterogeneous but can be converted.
  • 5For enterprise data with multiple entity types, consider HeteroData instead. You can always prototype on a homogeneous conversion and upgrade later.

A homogeneous graph is a graph where all nodes and edges are the same type. Every node lives in the same feature space, and every edge represents the same kind of relationship. A friendship network where every node is a person and every edge is a friendship is homogeneous. A citation network where every node is a paper and every edge is a citation is homogeneous.

This is the default graph structure in PyTorch Geometric. When you create aData object with node features and an edge index, you are building a homogeneous graph. Most GNN layers, benchmark datasets, and tutorials assume homogeneous graphs.

When graphs are homogeneous

A graph is homogeneous when every entity in the system is fundamentally the same kind of thing:

  • Social networks: all nodes are users, all edges are connections
  • Citation networks: all nodes are papers, all edges are citations
  • Molecular graphs: all nodes are atoms, all edges are bonds (with bond type as an edge feature)
  • Road networks: all nodes are intersections, all edges are road segments

The PyG Data object

In PyTorch Geometric, a homogeneous graph is represented by the torch_geometric.data.Data class. It stores:

  • x: node feature matrix of shape [num_nodes, num_features]
  • edge_index: graph connectivity in COO format, shape [2, num_edges]
  • edge_attr: edge features (optional)
  • y: target labels (optional)
create_homogeneous_graph.py
import torch
from torch_geometric.data import Data

# 4 nodes, each with 3 features
x = torch.tensor([
    [1.0, 0.5, 0.2],  # Node 0
    [0.3, 0.8, 0.1],  # Node 1
    [0.7, 0.2, 0.9],  # Node 2
    [0.4, 0.6, 0.3],  # Node 3
])

# Edges: 0->1, 1->2, 2->3, 3->0 (undirected = both directions)
edge_index = torch.tensor([
    [0, 1, 1, 2, 2, 3, 3, 0],
    [1, 0, 2, 1, 3, 2, 0, 3],
], dtype=torch.long)

data = Data(x=x, edge_index=edge_index)
print(data)
# Data(x=[4, 3], edge_index=[2, 8])

A minimal homogeneous graph. All 4 nodes share the same 3-dimensional feature space.

Enterprise example: fraud ring detection

Consider a bank that wants to detect fraud rings among account holders. Every node is an account. An edge connects two accounts that have transferred money to each other. Node features include account age, average balance, and transaction velocity.

This is a natural homogeneous graph because every entity is the same type: a bank account. A 2-layer GCN on this graph lets each account see its 2-hop transaction neighborhood. Fraud rings, where a cluster of accounts only transact with each other, create distinctive structural patterns that message passing captures automatically.

fraud_ring_homogeneous.py
import torch
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

class FraudDetector(torch.nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.conv1 = GCNConv(num_features, 64)
        self.conv2 = GCNConv(64, 32)
        self.classifier = torch.nn.Linear(32, 2)

    def forward(self, x, edge_index):
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return self.classifier(x)

# All nodes are accounts, all edges are transfers
# Standard GCNConv works directly on homogeneous graphs

A simple 2-layer GCN for node classification on a homogeneous transaction graph.

Homogeneous vs heterogeneous: when to upgrade

The key question is whether treating all entities as the same type loses important information:

  • Stay homogeneous when all nodes genuinely share the same feature space and semantics. A social network of users connected by friendships. A molecular graph of atoms connected by bonds.
  • Go heterogeneous when your data has fundamentally different entity types. An e-commerce database with users, products, and orders. A healthcare system with patients, doctors, and prescriptions. Forcing these into a single node type means padding features with zeros and losing type-specific semantics.

In practice, most enterprise relational databases are heterogeneous. But prototyping on a homogeneous conversion (using to_homogeneous()) is a valid strategy to get a baseline before building a full heterogeneous pipeline.

Standard layers that work on homogeneous graphs

All standard PyG convolutional layers operate on homogeneous graphs:

  • GCNConv: degree-normalized message passing
  • GATConv: attention-weighted message passing
  • SAGEConv: sampling-based for large graphs
  • GINConv: maximally expressive sum aggregation
  • TransformerConv: transformer-style attention on neighbors

For heterogeneous graphs, you would use to_hetero() to automatically convert these layers, or use dedicated heterogeneous layers like HGTConv.

Frequently asked questions

What is a homogeneous graph?

A homogeneous graph is a graph where every node belongs to the same type and every edge represents the same kind of relationship. A social network where all nodes are users and all edges are friendships is a classic example. In PyTorch Geometric, homogeneous graphs use the basic Data object.

What is the difference between homogeneous and heterogeneous graphs?

In a homogeneous graph, there is one node type and one edge type. In a heterogeneous graph, there are multiple node types (e.g., users, products, orders) and multiple edge types (e.g., purchases, reviews). Heterogeneous graphs use PyG's HeteroData object while homogeneous graphs use the simpler Data object.

When should I use a homogeneous graph vs a heterogeneous graph?

Use a homogeneous graph when all entities in your data are the same kind (e.g., molecules where all nodes are atoms, citation networks where all nodes are papers). Use a heterogeneous graph when your data has fundamentally different entity types like customers, products, and transactions. Most enterprise relational databases are naturally heterogeneous.

How do I create a homogeneous graph in PyTorch Geometric?

Create a torch_geometric.data.Data object with x (node features as a float tensor), edge_index (a 2xN long tensor of source-target pairs), and optionally y (labels) and edge_attr (edge features). PyG handles batching, GPU transfer, and neighbor sampling automatically.

Can I convert a heterogeneous graph to a homogeneous graph?

Yes. PyG provides to_homogeneous() on HeteroData objects. This merges all node types into one, concatenating features and mapping edge indices accordingly. You lose type information but gain compatibility with all standard GNN layers. This is useful for quick prototyping before building a proper heterogeneous model.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.