Graph isomorphism is the foundational concept that defines what GNNs can and cannot distinguish. Two graphs are isomorphic when they have identical structure, meaning there is a node-to-node mapping that preserves all edges. If a GNN produces the same output for two non-isomorphic graphs, it has lost structural information. The Weisfeiler-Leman test defines the theoretical boundary of what standard message passing can distinguish.
The isomorphism problem
Given two graphs, are they isomorphic? This sounds simple for small graphs but is surprisingly hard: there is no known polynomial-time algorithm for general graphs. For GNNs, the question becomes: can this architecture produce different outputs for two non-isomorphic graphs? If not, the model cannot leverage the structural difference for prediction.
The Weisfeiler-Leman test
The 1-WL test is an iterative color refinement algorithm:
- Initialize: assign each node the same color (or its degree as initial color)
- Iterate: each node's new color = hash(own color, sorted multiset of neighbors' colors)
- Compare: if the sorted multiset of all colors differs between two graphs, they are non-isomorphic
- Repeat until colors stabilize
If the 1-WL test declares two graphs “potentially isomorphic,” they might still be non-isomorphic. The test has false negatives: it cannot distinguish some non-isomorphic graphs, particularly certain regular graphs where all nodes have the same degree.
GIN: achieving the ceiling
Most GNN layers do not reach the 1-WL ceiling. GCN uses mean aggregation, which loses information about multiset size (it cannot distinguish a node with 3 neighbors all having feature 1 from a node with 1 neighbor with feature 1). GraphSAGE with max aggregation loses even more.
GIN (Graph Isomorphism Network) is designed to match the 1-WL ceiling:
- Sum aggregation: preserves the full multiset of neighbor features (including multiplicity). Sum is the only standard aggregation that is injective for multisets.
- MLP update: an MLP (not a linear layer) ensures the update function is injective. This preserves the distinction between different aggregated messages.
- Epsilon parameter: a learnable weight on the self-loop ensures the node's own features can be weighted differently from the aggregated neighbors.
Why expressiveness matters in practice
Consider two molecular graphs that are non-isomorphic but differ only in the arrangement of atoms in a ring:
- Molecule A: cyclohexane (single 6-membered ring)
- Molecule B: two fused 3-membered rings
Both have the same number of atoms and bonds. Both are 2-regular (every atom has degree 2). The 1-WL test (and therefore standard GNNs) cannot distinguish them. A GCN would assign both molecules the same graph embedding, even though their chemical properties are completely different.
For drug discovery, this indistinguishability is a real limitation. Graph transformers and higher-order GNNs overcome it.
Beyond 1-WL: higher-order expressiveness
Several approaches exceed the 1-WL ceiling:
- k-WL and k-FWL: operate on k-tuples of nodes instead of individual nodes. More expressive but exponentially more expensive (O(N^k) complexity).
- Graph transformers: global attention can capture structural patterns that local message passing misses. Structural positional encodings (Laplacian eigenvectors) provide information beyond what 1-WL computes.
- Subgraph GNNs: process collections of subgraphs around each node, providing additional structural information.
- Random features: adding random node features breaks symmetries that 1-WL preserves, increasing expressiveness at the cost of determinism.