Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Graph Isomorphism: When Two Graphs Are Structurally Identical

Graph isomorphism asks: are these two graphs the same structure with different labels? The answer defines the expressiveness ceiling for GNNs. If a GNN cannot distinguish two non-isomorphic graphs, it cannot use the structural difference for prediction.

PyTorch Geometric

TL;DR

  • 1Two graphs are isomorphic if there exists a node mapping preserving all edges. They are structurally identical despite different labeling. Determining isomorphism is computationally hard in general.
  • 2The 1-WL (Weisfeiler-Leman) test iteratively labels nodes by their neighborhood. If label multisets differ, graphs are non-isomorphic. It cannot distinguish all non-isomorphic graphs but is efficient and practical.
  • 3Standard message passing GNNs are bounded by the 1-WL test: they cannot distinguish graphs that 1-WL cannot. This is the expressiveness ceiling for GCN, GAT, and GraphSAGE.
  • 4GIN (Graph Isomorphism Network) achieves this ceiling by using sum aggregation and MLP updates. It is provably the most expressive standard message passing GNN.
  • 5Graph transformers and higher-order GNNs (k-WL) exceed the 1-WL ceiling, distinguishing graphs that standard GNNs cannot. This matters for molecular property prediction where subtle structural differences change properties.

Graph isomorphism is the foundational concept that defines what GNNs can and cannot distinguish. Two graphs are isomorphic when they have identical structure, meaning there is a node-to-node mapping that preserves all edges. If a GNN produces the same output for two non-isomorphic graphs, it has lost structural information. The Weisfeiler-Leman test defines the theoretical boundary of what standard message passing can distinguish.

The isomorphism problem

Given two graphs, are they isomorphic? This sounds simple for small graphs but is surprisingly hard: there is no known polynomial-time algorithm for general graphs. For GNNs, the question becomes: can this architecture produce different outputs for two non-isomorphic graphs? If not, the model cannot leverage the structural difference for prediction.

The Weisfeiler-Leman test

The 1-WL test is an iterative color refinement algorithm:

  1. Initialize: assign each node the same color (or its degree as initial color)
  2. Iterate: each node's new color = hash(own color, sorted multiset of neighbors' colors)
  3. Compare: if the sorted multiset of all colors differs between two graphs, they are non-isomorphic
  4. Repeat until colors stabilize

If the 1-WL test declares two graphs “potentially isomorphic,” they might still be non-isomorphic. The test has false negatives: it cannot distinguish some non-isomorphic graphs, particularly certain regular graphs where all nodes have the same degree.

GIN: achieving the ceiling

Most GNN layers do not reach the 1-WL ceiling. GCN uses mean aggregation, which loses information about multiset size (it cannot distinguish a node with 3 neighbors all having feature 1 from a node with 1 neighbor with feature 1). GraphSAGE with max aggregation loses even more.

GIN (Graph Isomorphism Network) is designed to match the 1-WL ceiling:

  • Sum aggregation: preserves the full multiset of neighbor features (including multiplicity). Sum is the only standard aggregation that is injective for multisets.
  • MLP update: an MLP (not a linear layer) ensures the update function is injective. This preserves the distinction between different aggregated messages.
  • Epsilon parameter: a learnable weight on the self-loop ensures the node's own features can be weighted differently from the aggregated neighbors.

Why expressiveness matters in practice

Consider two molecular graphs that are non-isomorphic but differ only in the arrangement of atoms in a ring:

  • Molecule A: cyclohexane (single 6-membered ring)
  • Molecule B: two fused 3-membered rings

Both have the same number of atoms and bonds. Both are 2-regular (every atom has degree 2). The 1-WL test (and therefore standard GNNs) cannot distinguish them. A GCN would assign both molecules the same graph embedding, even though their chemical properties are completely different.

For drug discovery, this indistinguishability is a real limitation. Graph transformers and higher-order GNNs overcome it.

Beyond 1-WL: higher-order expressiveness

Several approaches exceed the 1-WL ceiling:

  • k-WL and k-FWL: operate on k-tuples of nodes instead of individual nodes. More expressive but exponentially more expensive (O(N^k) complexity).
  • Graph transformers: global attention can capture structural patterns that local message passing misses. Structural positional encodings (Laplacian eigenvectors) provide information beyond what 1-WL computes.
  • Subgraph GNNs: process collections of subgraphs around each node, providing additional structural information.
  • Random features: adding random node features breaks symmetries that 1-WL preserves, increasing expressiveness at the cost of determinism.

Frequently asked questions

What is graph isomorphism?

Two graphs G1 and G2 are isomorphic if there exists a one-to-one mapping between their nodes that preserves all edge connections. In simple terms: they have the same structure, just with different node labels. Determining whether two graphs are isomorphic is the graph isomorphism problem, which has no known polynomial-time algorithm for general graphs.

What is the Weisfeiler-Leman (WL) test?

The 1-WL test is an iterative algorithm that distinguishes non-isomorphic graphs. At each step, each node is assigned a label based on its own label and the sorted multiset of its neighbors' labels. If the multiset of all labels differs between two graphs at any step, they are not isomorphic. The 1-WL test cannot distinguish all non-isomorphic graphs (it fails on certain regular graphs) but is efficient and practical.

How does GIN relate to the WL test?

The Graph Isomorphism Network (GIN) was designed to be as expressive as the 1-WL test. It uses sum aggregation (not mean or max) and an injective update function (MLP instead of linear). This means GIN can distinguish any two graphs that the 1-WL test can distinguish. It is provably the most expressive standard message passing GNN.

Why does this matter for practical GNN applications?

The expressiveness ceiling determines what structural patterns a GNN can detect. If two different fraud ring topologies are indistinguishable by the 1-WL test, standard GNNs cannot tell them apart. Using maximally expressive architectures (GIN) or higher-order GNNs (k-WL) ensures the model can leverage all available structural information. For molecular property prediction, where subtle structural differences change properties, this is critical.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.