A homogeneous graph is a graph where all nodes and edges are the same type. Every node lives in the same feature space, and every edge represents the same kind of relationship. A friendship network where every node is a person and every edge is a friendship is homogeneous. A citation network where every node is a paper and every edge is a citation is homogeneous.
This is the default graph structure in PyTorch Geometric. When you create aData object with node features and an edge index, you are building a homogeneous graph. Most GNN layers, benchmark datasets, and tutorials assume homogeneous graphs.
When graphs are homogeneous
A graph is homogeneous when every entity in the system is fundamentally the same kind of thing:
- Social networks: all nodes are users, all edges are connections
- Citation networks: all nodes are papers, all edges are citations
- Molecular graphs: all nodes are atoms, all edges are bonds (with bond type as an edge feature)
- Road networks: all nodes are intersections, all edges are road segments
The PyG Data object
In PyTorch Geometric, a homogeneous graph is represented by the torch_geometric.data.Data class. It stores:
- x: node feature matrix of shape [num_nodes, num_features]
- edge_index: graph connectivity in COO format, shape [2, num_edges]
- edge_attr: edge features (optional)
- y: target labels (optional)
import torch
from torch_geometric.data import Data
# 4 nodes, each with 3 features
x = torch.tensor([
[1.0, 0.5, 0.2], # Node 0
[0.3, 0.8, 0.1], # Node 1
[0.7, 0.2, 0.9], # Node 2
[0.4, 0.6, 0.3], # Node 3
])
# Edges: 0->1, 1->2, 2->3, 3->0 (undirected = both directions)
edge_index = torch.tensor([
[0, 1, 1, 2, 2, 3, 3, 0],
[1, 0, 2, 1, 3, 2, 0, 3],
], dtype=torch.long)
data = Data(x=x, edge_index=edge_index)
print(data)
# Data(x=[4, 3], edge_index=[2, 8])A minimal homogeneous graph. All 4 nodes share the same 3-dimensional feature space.
Enterprise example: fraud ring detection
Consider a bank that wants to detect fraud rings among account holders. Every node is an account. An edge connects two accounts that have transferred money to each other. Node features include account age, average balance, and transaction velocity.
This is a natural homogeneous graph because every entity is the same type: a bank account. A 2-layer GCN on this graph lets each account see its 2-hop transaction neighborhood. Fraud rings, where a cluster of accounts only transact with each other, create distinctive structural patterns that message passing captures automatically.
import torch
from torch_geometric.nn import GCNConv
import torch.nn.functional as F
class FraudDetector(torch.nn.Module):
def __init__(self, num_features):
super().__init__()
self.conv1 = GCNConv(num_features, 64)
self.conv2 = GCNConv(64, 32)
self.classifier = torch.nn.Linear(32, 2)
def forward(self, x, edge_index):
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index)
return self.classifier(x)
# All nodes are accounts, all edges are transfers
# Standard GCNConv works directly on homogeneous graphsA simple 2-layer GCN for node classification on a homogeneous transaction graph.
Homogeneous vs heterogeneous: when to upgrade
The key question is whether treating all entities as the same type loses important information:
- Stay homogeneous when all nodes genuinely share the same feature space and semantics. A social network of users connected by friendships. A molecular graph of atoms connected by bonds.
- Go heterogeneous when your data has fundamentally different entity types. An e-commerce database with users, products, and orders. A healthcare system with patients, doctors, and prescriptions. Forcing these into a single node type means padding features with zeros and losing type-specific semantics.
In practice, most enterprise relational databases are heterogeneous. But prototyping on a homogeneous conversion (using to_homogeneous()) is a valid strategy to get a baseline before building a full heterogeneous pipeline.
Standard layers that work on homogeneous graphs
All standard PyG convolutional layers operate on homogeneous graphs:
- GCNConv: degree-normalized message passing
- GATConv: attention-weighted message passing
- SAGEConv: sampling-based for large graphs
- GINConv: maximally expressive sum aggregation
- TransformerConv: transformer-style attention on neighbors
For heterogeneous graphs, you would use to_hetero() to automatically convert these layers, or use dedicated heterogeneous layers like HGTConv.