Heterophily is the graph property where connected nodes tend to have different labels or dissimilar features, and it is the regime where standard GNN architectures perform worst. Most GNN layers aggregate neighbor features through averaging (GCNConv) or weighted averaging (GATConv). Under heterophily, neighbors have different labels, so averaging mixes conflicting class signals. The node's own informative features get diluted by neighbor noise. After 2-3 layers, the class-specific signal can be destroyed entirely.
Why it matters for enterprise data
Enterprise relational databases frequently contain heterophilous relationships:
- Customer-product graphs: A customer buying from many product categories creates edges between the customer (one behavior profile) and diverse products (different category labels).
- Supply chain networks: Manufacturers connect to distributors connect to retailers. Each entity type has different characteristics and labels.
- Fraud-victim interactions: Fraudsters target victims with different risk profiles. The edge between fraudster and victim is inherently heterophilous.
Applying a standard GCNConv to these graphs can perform worse than ignoring graph structure entirely. Recognizing heterophily and choosing the right architecture is critical for enterprise GNN deployment.
How heterophily degrades GNN performance
Consider node classification on a graph with edge homophily h = 0.2 (80% of edges cross class boundaries):
- Layer 0: Each node has its own features, distinct per class. An MLP would classify correctly.
- Layer 1 (GCNConv): Each node averages its neighbors' features. Since 80% of neighbors are different-class, the averaged features are dominated by the wrong class signal.
- Layer 2: Averaging again further dilutes the original signal. The node's representation now reflects the majority class of its 2-hop neighborhood, not its own class.
Solutions for heterophilous graphs
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class SeparateAggGNN(torch.nn.Module):
"""Handle heterophily by keeping ego and neighbor features separate."""
def __init__(self, in_dim, hidden_dim, out_dim):
super().__init__()
# Separate transforms for self and neighbors
self.self_lin = torch.nn.Linear(in_dim, hidden_dim)
self.neigh_conv = GCNConv(in_dim, hidden_dim)
# Learnable combination
self.combine = torch.nn.Linear(hidden_dim * 2, out_dim)
def forward(self, x, edge_index):
# Keep ego representation separate from aggregated neighbors
x_self = F.relu(self.self_lin(x))
x_neigh = F.relu(self.neigh_conv(x, edge_index))
# Concatenate and let the model learn how to combine
x_combined = torch.cat([x_self, x_neigh], dim=-1)
return self.combine(x_combined)
# Key insight: by separating self and neighbor features,
# the model can learn to IGNORE neighbor signal when it conflicts
# (heterophily) or REINFORCE it when it agrees (homophily).Separating ego and neighbor representations is the simplest heterophily-aware technique. The model learns whether neighbors help or hurt for each feature dimension.
Approach 1: Ego-neighbor separation
Process a node's own features and its aggregated neighbor features through separate neural networks, then combine them. This lets the model learn to ignore or even negate the neighbor signal when it conflicts with the ego signal.
Approach 2: Higher-order neighborhoods
Even if 1-hop neighbors are heterophilous, 2-hop neighbors might be homophilous. In a bipartite graph, nodes 2 hops away are the same type (customer → product → customer). Concatenating features from multiple hop distances lets the model find the right aggregation scale.
Approach 3: Graph transformers
Graph transformers with attention can learn to assign negative effective weights to heterophilous neighbors, effectively subtracting their influence. This happens naturally through the attention mechanism without explicit architectural modifications.
Limitations and what comes next
- Mixed homophily is the norm: Real enterprise graphs have both homophilous and heterophilous edges. Global metrics like edge homophily ratio oversimplify. Per-edge or per-node homophily is more informative but harder to optimize for.
- Heterophily benchmarks are limited: Academic benchmarks (Texas, Wisconsin, Cornell) are small and have high label noise. Results on these datasets do not always transfer to enterprise-scale heterophilous graphs.
- Detection requires labels: You cannot measure homophily without labels. For unsupervised tasks, you must infer heterophily from feature similarity or use architecture-agnostic approaches.
KumoRFM's Relational Graph Transformer handles both homophily and heterophily through global attention that adapts to local graph properties. It achieves strong performance on RelBench tasks regardless of the underlying homophily pattern.