What is a homogeneous graph?

A homogeneous graph is a graph where every node belongs to the same type and every edge represents the same kind of relationship. A social network where all nodes are users and all edges are friendships is a classic example. In PyTorch Geometric, homogeneous graphs use the basic Data object.

What is the difference between homogeneous and heterogeneous graphs?

In a homogeneous graph, there is one node type and one edge type. In a heterogeneous graph, there are multiple node types (e.g., users, products, orders) and multiple edge types (e.g., purchases, reviews). Heterogeneous graphs use PyG's HeteroData object while homogeneous graphs use the simpler Data object.

When should I use a homogeneous graph vs a heterogeneous graph?

Use a homogeneous graph when all entities in your data are the same kind (e.g., molecules where all nodes are atoms, citation networks where all nodes are papers). Use a heterogeneous graph when your data has fundamentally different entity types like customers, products, and transactions. Most enterprise relational databases are naturally heterogeneous.

How do I create a homogeneous graph in PyTorch Geometric?

Create a torch_geometric.data.Data object with x (node features as a float tensor), edge_index (a 2xN long tensor of source-target pairs), and optionally y (labels) and edge_attr (edge features). PyG handles batching, GPU transfer, and neighbor sampling automatically.

Can I convert a heterogeneous graph to a homogeneous graph?

Yes. PyG provides to_homogeneous() on HeteroData objects. This merges all node types into one, concatenating features and mapping edge indices accordingly. You lose type information but gain compatibility with all standard GNN layers. This is useful for quick prototyping before building a proper heterogeneous model.

Homogeneous Graphs in PyG: Single Node and Edge Type Graphs | Kumo.ai

A homogeneous graph is a graph where all nodes and edges are the same type. Every node lives in the same feature space, and every edge represents the same kind of relationship. A friendship network where every node is a person and every edge is a friendship is homogeneous. A citation network where every node is a paper and every edge is a citation is homogeneous.

This is the default graph structure in PyTorch Geometric. When you create aData object with node features and an edge index, you are building a homogeneous graph. Most GNN layers, benchmark datasets, and tutorials assume homogeneous graphs.

When graphs are homogeneous

A graph is homogeneous when every entity in the system is fundamentally the same kind of thing:

Social networks: all nodes are users, all edges are connections
Citation networks: all nodes are papers, all edges are citations
Molecular graphs: all nodes are atoms, all edges are bonds (with bond type as an edge feature)
Road networks: all nodes are intersections, all edges are road segments

The PyG Data object

In PyTorch Geometric, a homogeneous graph is represented by the torch_geometric.data.Data class. It stores:

x: node feature matrix of shape [num_nodes, num_features]
edge_index: graph connectivity in COO format, shape [2, num_edges]
edge_attr: edge features (optional)
y: target labels (optional)

create_homogeneous_graph.py

import torch
from torch_geometric.data import Data

# 4 nodes, each with 3 features
x = torch.tensor([
    [1.0, 0.5, 0.2],  # Node 0
    [0.3, 0.8, 0.1],  # Node 1
    [0.7, 0.2, 0.9],  # Node 2
    [0.4, 0.6, 0.3],  # Node 3
])

# Edges: 0->1, 1->2, 2->3, 3->0 (undirected = both directions)
edge_index = torch.tensor([
    [0, 1, 1, 2, 2, 3, 3, 0],
    [1, 0, 2, 1, 3, 2, 0, 3],
], dtype=torch.long)

data = Data(x=x, edge_index=edge_index)
print(data)
# Data(x=[4, 3], edge_index=[2, 8])

A minimal homogeneous graph. All 4 nodes share the same 3-dimensional feature space.

Enterprise example: fraud ring detection

Consider a bank that wants to detect fraud rings among account holders. Every node is an account. An edge connects two accounts that have transferred money to each other. Node features include account age, average balance, and transaction velocity.

This is a natural homogeneous graph because every entity is the same type: a bank account. A 2-layer GCN on this graph lets each account see its 2-hop transaction neighborhood. Fraud rings, where a cluster of accounts only transact with each other, create distinctive structural patterns that message passing captures automatically.

fraud_ring_homogeneous.py

import torch
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

class FraudDetector(torch.nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.conv1 = GCNConv(num_features, 64)
        self.conv2 = GCNConv(64, 32)
        self.classifier = torch.nn.Linear(32, 2)

    def forward(self, x, edge_index):
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return self.classifier(x)

# All nodes are accounts, all edges are transfers
# Standard GCNConv works directly on homogeneous graphs

A simple 2-layer GCN for node classification on a homogeneous transaction graph.

Homogeneous vs heterogeneous: when to upgrade

The key question is whether treating all entities as the same type loses important information:

Stay homogeneous when all nodes genuinely share the same feature space and semantics. A social network of users connected by friendships. A molecular graph of atoms connected by bonds.
Go heterogeneous when your data has fundamentally different entity types. An e-commerce database with users, products, and orders. A healthcare system with patients, doctors, and prescriptions. Forcing these into a single node type means padding features with zeros and losing type-specific semantics.

In practice, most enterprise relational databases are heterogeneous. But prototyping on a homogeneous conversion (using to_homogeneous()) is a valid strategy to get a baseline before building a full heterogeneous pipeline.

Standard layers that work on homogeneous graphs

All standard PyG convolutional layers operate on homogeneous graphs:

GCNConv: degree-normalized message passing
GATConv: attention-weighted message passing
SAGEConv: sampling-based for large graphs
GINConv: maximally expressive sum aggregation
TransformerConv: transformer-style attention on neighbors

For heterogeneous graphs, you would use to_hetero() to automatically convert these layers, or use dedicated heterogeneous layers like HGTConv.

Key Takeaways

1A homogeneous graph has one node type and one edge type. All nodes share the same feature space. This is the default and simplest graph structure in PyG.
2Use PyG's Data object to represent homogeneous graphs. It stores node features (x), edges (edge_index), and optional labels (y) and edge features (edge_attr).
3Social networks, citation graphs, molecular graphs, and road networks are natural homogeneous graphs. All standard GNN layers work on them directly.
4Enterprise relational databases are usually heterogeneous. Forcing multiple entity types into a single node type loses information. Use HeteroData for multi-table data.
5Prototyping tip: convert heterogeneous data to homogeneous with to_homogeneous() for a quick baseline, then upgrade to a proper heterogeneous model for production.

Homogeneous Graphs: Single Node and Edge Type Graphs