Spatial graph neural network methods define convolution as local neighborhood operations where each node aggregates feature vectors from its direct neighbors and updates its own representation, operating directly in the graph domain rather than a spectral frequency domain. Every modern GNN layer (GCNConv, GATConv, SAGEConv, GINConv) is a spatial method. They are scalable (O(|E|) per layer), naturally inductive (generalize to unseen nodes), and compatible with mini-batch training via neighbor sampling. Spatial methods are the only practical choice for enterprise-scale graph neural networks.
Why it matters for enterprise data
Enterprise graphs have millions of nodes and billions of edges. Spatial methods are the only GNN paradigm that can handle this scale:
- Scalability: Each node only looks at its neighbors. Computation scales with the number of edges, not nodes squared.
- Inductive inference: New customers, transactions, and products can be scored immediately without retraining. The learned aggregation function works on any node.
- Mini-batch training: Neighbor sampling creates manageable mini-batches from massive graphs. Not possible with spectral methods.
The spatial methods family
from torch_geometric.nn import GCNConv, GATConv, SAGEConv, GINConv
from torch.nn import Sequential, Linear, ReLU
# GCNConv: degree-normalized mean aggregation + linear transform
# Best for: homophilous graphs, simple baselines, maximum speed
gcn = GCNConv(in_channels=64, out_channels=64)
# GATConv: learned attention weights per neighbor
# Best for: variable neighbor importance, interpretable weights
gat = GATConv(64, 64, heads=8)
# SAGEConv: sample + aggregate with separate self/neighbor transforms
# Best for: large-scale inductive settings, production deployment
sage = SAGEConv(64, 64, aggr='mean')
# GINConv: sum aggregation + MLP for maximum expressiveness
# Best for: tasks requiring fine structural discrimination
gin_nn = Sequential(Linear(64, 64), ReLU(), Linear(64, 64))
gin = GINConv(gin_nn)
# All four are spatial methods. All inherit from MessagePassing.
# The difference is HOW they aggregate neighbor information.Four spatial methods, four aggregation strategies. All operate locally on neighborhoods. PyG's MessagePassing base class provides the shared infrastructure.
Concrete example: choosing a spatial method for enterprise churn prediction
A telecom company building a churn model on a customer call graph:
- Graph: 50M customers, 2B call/text edges
- Features: tenure, plan, monthly charge, support calls
- Task: predict which customers will churn next month
Architecture choice:
- SAGEConv for the layers (designed for large-scale inductive learning)
- NeighborLoader with [15, 10] sampling for mini-batch training
- 2 layers (customer sees 2-hop social network influence)
- Linear classification head
Spatial vs. spectral: a practical comparison
- Computation: Spatial O(|E|) per layer. Spectral O(n^3) for eigendecomposition + O(n^2) per layer.
- Scalability: Spatial works on billion-edge graphs with sampling. Spectral is limited to thousands of nodes.
- Inductive: Spatial generalizes to new nodes. Spectral is graph-specific.
- Theory: Spectral provides cleaner mathematical foundations. Spatial is more intuitive and practical.
Limitations and what comes next
- Local receptive field: Each layer sees only 1-hop neighbors. k layers = k hops. Long-range dependencies require many layers, which causes over-smoothing.
- Expressiveness ceiling: All spatial methods are bounded by the 1-WL test. Positional and structural encodings help overcome this.
- No global context: Spatial methods cannot capture graph-wide patterns without stacking many layers. Graph transformers address this with global attention.