Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

GNN vs CNN: Graphs vs Images, Irregular vs Regular Structure

A CNN is a GNN on a grid. GNNs generalize convolution from regular pixel grids to arbitrary graph structures where nodes have variable numbers of neighbors in no fixed spatial arrangement.

PyTorch Geometric

TL;DR

  • 1CNNs operate on regular grids: every pixel has the same number of neighbors in a fixed spatial layout. GNNs operate on irregular graphs: each node has a different number of neighbors with no fixed arrangement.
  • 2Both use the same principle: local aggregation with shared weights. CNNs apply a fixed kernel to grid neighborhoods. GNNs apply a shared transformation to variable-size neighborhoods with permutation-invariant aggregation.
  • 3A CNN IS a special case of a GNN: an image is a graph with grid connectivity. The GNN framework generalizes CNNs to arbitrary topology.
  • 4CNNs cannot process graphs: irregular neighborhoods break fixed-size kernels. GNNs cannot match CNN efficiency on grids: the regular structure allows optimizations (FFT, tensor operations) that irregular graphs preclude.
  • 5GNNs handle what CNNs cannot: social networks, molecules, supply chains, relational databases. Any data with irregular relationships between entities.

CNNs and GNNs share the same core idea, local aggregation with shared weights, but apply it to fundamentally different data structures. A CNN convolves a fixed-size kernel over a regular pixel grid where every pixel has exactly the same neighborhood structure. A GNN aggregates information from a variable-size neighborhood on an irregular graph where node A might have 3 neighbors and node B might have 300.

Regular vs irregular structure

Images: the regular grid

An image is a 2D grid. Every interior pixel has exactly 8 neighbors (3x3 neighborhood). The spatial relationships are fixed: the pixel above is always above, the pixel to the right is always to the right. This regularity enables:

  • Fixed-size kernels: a 3x3 kernel has exactly 9 weights, applied identically everywhere
  • Translation equivariance: the same pattern is detected anywhere in the image
  • Efficient computation: convolution on grids can be implemented as matrix multiplication or FFT, highly optimized on GPUs

Graphs: irregular topology

A graph has no fixed neighborhood structure. In a social network, one user might have 5 friends and another might have 5,000. There is no spatial layout. This irregularity requires:

  • Variable-size aggregation: the “kernel” must handle any number of neighbors. Permutation-invariant functions (sum, mean, max) replace fixed kernel weights.
  • Permutation invariance: the order of neighbors does not matter (unlike the fixed up/down/left/right of pixels)
  • Sparse computation: only connected node pairs interact, using sparse matrix operations

The convolution analogy

Both architectures perform local aggregation:

  • CNN: new_pixel = sum(kernel_weight * neighbor_pixel for each fixed neighbor position)
  • GNN: new_node = aggregate(transform(neighbor_feature) for each neighbor)

The difference is how “neighbor” is defined. For CNNs, neighbors are defined by grid position (up, down, left, right, diagonals). For GNNs, neighbors are defined by edge connections in the graph. The CNN kernel has position-specific weights (one weight for the pixel above, a different weight for the pixel to the right). The GNN transformation is shared across all neighbors (because there is no fixed spatial relationship).

Why you cannot use CNNs on graphs

Three fundamental problems:

  1. Variable degree: a 3x3 CNN kernel assumes 8 neighbors. Node A has 3 neighbors; node B has 300. No fixed kernel size works.
  2. No spatial ordering: a CNN kernel assigns weight_1 to the top-left pixel, weight_2 to the top-center, etc. Graph neighbors have no such ordering. Which neighbor gets weight_1?
  3. Permutation sensitivity: relabeling graph nodes changes the adjacency matrix but not the graph. CNNs on adjacency matrices would produce different outputs for the same graph.

Why you should not use GNNs on images

While GNNs can process images (treat pixels as nodes on a grid graph), they should not. The regular grid structure of images enables optimizations that GNNs cannot exploit:

  • CNNs use dense tensor operations optimized for GPU parallelism
  • CNNs exploit translation equivariance (the same kernel slides across the image)
  • CNNs use well-developed architectures (ResNet, EfficientNet) with decades of optimization

Using a GNN on an image would be slower and less effective because it ignores the valuable regular structure.

When each architecture applies

  • Images, video, audio spectrograms: CNN (regular grid structure)
  • Molecules, proteins: GNN (atoms/residues with variable bonds/contacts)
  • Social networks, citation networks: GNN (users/papers with variable connections)
  • Relational databases: GNN (rows with foreign key connections)
  • Point clouds, meshes: GNN (irregular 3D structure)
  • Road networks, circuits: GNN (nodes with variable degree)

Frequently asked questions

How is graph convolution related to image convolution?

An image pixel has a fixed set of neighbors (3x3 grid). A graph node has a variable number of neighbors. Image convolution applies the same fixed-size kernel to every pixel's grid neighborhood. Graph convolution aggregates features from a node's variable-size neighborhood and applies a shared transformation. The underlying principle is the same (local aggregation + shared weights), but graph convolution generalizes to irregular structure.

Why can't CNNs process graphs directly?

CNNs require a regular grid structure: every pixel has the same number of neighbors in the same spatial arrangement. Graphs are irregular: node A might have 3 neighbors, node B might have 100. There is no fixed spatial arrangement. Graph convolution handles this irregularity through permutation-invariant aggregation (sum, mean, max) instead of fixed kernel weights.

Can you represent a graph as an image?

You could represent the adjacency matrix as an image and apply CNNs, but this loses: (1) permutation invariance (reordering nodes changes the image but not the graph), (2) sparsity efficiency (the adjacency matrix is mostly zeros), and (3) scalability (a 1M-node graph would require a 1M x 1M image). Graph representations are fundamentally better for graph-structured data.

Is a CNN a special case of a GNN?

Yes. An image is a graph where each pixel is a node connected to its grid neighbors. A CNN is a GNN with a fixed neighborhood structure (the grid), fixed aggregation weights (the kernel), and weight sharing across all positions. GNNs generalize CNNs by removing the regular grid assumption, making them applicable to any graph structure.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.